- 7.60 MB
- 2022-08-09 发布
- 1、本文档由用户上传,淘文库整理发布,可阅读全部内容。
- 2、本文档内容版权归属内容提供方,所产生的收益全部归内容提供方所有。如果您对本文有版权争议,请立即联系网站客服。
- 3、本文档由用户上传,本站不保证质量和数量令人满意,可能有诸多瑕疵,付费之前,请仔细阅读内容确认后进行付费下载。
- 网站客服QQ:403074932
Greene-50240gree50240˙FMJuly10,200212:51FIFTHEDITIONECONOMETRICANALYSISQWilliamH.GreeneNewYorkUniversityUpperSaddleRiver,NewJersey07458iii\nGreene-50240gree50240˙FMJuly10,200212:51CIPdatatocomeExecutiveEditor:RodBanisterEditor-in-Chief:P.J.BoardmanManagingEditor:GladysSotoAssistantEditor:MarieMcHaleEditorialAssistant:LisaAmatoSeniorMediaProjectManager:VictoriaAndersonExecutiveMarketingManager:KathleenMcLellanMarketingAssistant:ChristopherBathManagingEditor(Production):CynthiaReganProductionEditor:MichaelReynoldsProductionAssistant:DianneFalconePermissionsSupervisor:SuzanneGrappiAssociateDirector,Manufacturing:VinnieSceltaCoverDesigner:KiwiDesignCoverPhoto:AnthonyBannister/CorbisComposition:InteractiveCompositionCorporationPrinter/Binder:Courier/WestfordCoverPrinter:CoralGraphicsCreditsandacknowledgmentsborrowedfromothersourcesandreproduced,withpermission,inthistextbookappearonappropriatepagewithintext(oronpageXX).Copyright©2003,2000,1997,1993byPearsonEducation,Inc.,UpperSaddleRiver,NewJersey,07458.Allrightsreserved.PrintedintheUnitedStatesofAmerica.ThispublicationisprotectedbyCopyrightandpermissionshouldbeobtainedfromthepublisherpriortoanyprohibitedreproduction,storageinaretrievalsystem,ortransmissioninanyformorbyanymeans,electronic,mechanical,photocopying,recording,orlikewise.Forinformationregardingpermission(s),writeto:RightsandPermissionsDepartment.PearsonEducationLTD.PearsonEducationAustraliaPTY,LimitedPearsonEducationSingapore,Pte.LtdPearsonEducationNorthAsiaLtdPearsonEducation,Canada,LtdPearsonEducacióndeMexico,S.A.deC.V.PearsonEducation–JapanPearsonEducationMalaysia,Pte.Ltd10987654321ISBN0-13-066189-9iv\nGreene-50240gree50240˙FMJuly10,200212:51BRIEFCONTENTSQChapter1Introduction1Chapter2TheClassicalMultipleLinearRegressionModel7Chapter3LeastSquares19Chapter4Finite-SamplePropertiesoftheLeastSquaresEstimator41Chapter5Large-SamplePropertiesoftheLeastSquaresandInstrumentalVariablesEstimators65Chapter6InferenceandPrediction93Chapter7FunctionalFormandStructuralChange116Chapter8SpecificationAnalysisandModelSelection148Chapter9NonlinearRegressionModels162Chapter10NonsphericalDisturbances—TheGeneralizedRegressionModel191Chapter11Heteroscedasticity215Chapter12SerialCorrelation250Chapter13ModelsforPanelData283Chapter14SystemsofRegressionEquations339Chapter15Simultaneous-EquationsModels378Chapter16EstimationFrameworksinEconometrics425Chapter17MaximumLikelihoodEstimation468Chapter18TheGeneralizedMethodofMoments525Chapter19ModelswithLaggedVariables558Chapter20Time-SeriesModels608Chapter21ModelsforDiscreteChoice663Chapter22LimitedDependentVariableandDurationModels756AppendixAMatrixAlgebra803AppendixBProbabilityandDistributionTheory845AppendixCEstimationandInference877AppendixDLargeSampleDistributionTheory896vii\nGreene-50240gree50240˙FMJuly10,200212:51viiiBriefContentsAppendixEComputationandOptimization919AppendixFDataSetsUsedinApplications946AppendixGStatisticalTables953References959AuthorIndex000SubjectIndex000\nGreene-50240gree50240˙FMJuly10,200212:51CONTENTSQCHAPTER1Introduction11.1Econometrics11.2EconometricModeling11.3DataandMethodology41.4PlanoftheBook5CHAPTER2TheClassicalMultipleLinearRegressionModel72.1Introduction72.2TheLinearRegressionModel72.3AssumptionsoftheClassicalLinearRegressionModel102.3.1LinearityoftheRegressionModel112.3.2FullRank132.3.3Regression142.3.4SphericalDisturbances152.3.5DataGeneratingProcessfortheRegressors162.3.6Normality172.4SummaryandConclusions18CHAPTER3LeastSquares193.1Introduction193.2LeastSquaresRegression193.2.1TheLeastSquaresCoefficientVector203.2.2Application:AnInvestmentEquation213.2.3AlgebraicAspectsofTheLeastSquaresSolution243.2.4Projection243.3PartitionedRegressionandPartialRegression263.4PartialRegressionandPartialCorrelationCoefficients283.5GoodnessofFitandtheAnalysisofVariance313.5.1TheAdjustedR-SquaredandaMeasureofFit343.5.2R-SquaredandtheConstantTermintheModel363.5.3ComparingModels373.6SummaryandConclusions38ix\nGreene-50240gree50240˙FMJuly10,200212:51xContentsCHAPTER4Finite-SamplePropertiesoftheLeastSquaresEstimator414.1Introduction414.2MotivatingLeastSquares424.2.1ThePopulationOrthogonalityConditions424.2.2MinimumMeanSquaredErrorPredictor434.2.3MinimumVarianceLinearUnbiasedEstimation444.3UnbiasedEstimation444.4TheVarianceoftheLeastSquaresEstimatorandtheGaussMarkovTheorem454.5TheImplicationsofStochasticRegressors474.6EstimatingtheVarianceoftheLeastSquaresEstimator484.7TheNormalityAssumptionandBasicStatisticalInference504.7.1TestingaHypothesisAboutaCoefficient504.7.2ConfidenceIntervalsforParameters524.7.3ConfidenceIntervalforaLinearCombinationofCoefficients:TheOaxacaDecomposition534.7.4TestingtheSignificanceoftheRegression544.7.5MarginalDistributionsoftheTestStatistics554.8Finite-SamplePropertiesofLeastSquares554.9DataProblems564.9.1Multicollinearity564.9.2MissingObservations594.9.3RegressionDiagnosticsandInfluentialDataPoints604.10SummaryandConclusions61CHAPTER5Large-SamplePropertiesoftheLeastSquaresandInstrumentalVariablesEstimators655.1Introduction655.2AsymptoticPropertiesoftheLeastSquaresEstimator655.2.1ConsistencyoftheLeastSquaresEstimatorofβ665.2.2AsymptoticNormalityoftheLeastSquaresEstimator675.2.3Consistencyofs2andtheEstimatorofAsy.Var[b]695.2.4AsymptoticDistributionofaFunctionofb:TheDeltaMethod705.2.5AsymptoticEfficiency705.3MoreGeneralCases725.3.1HeterogeneityintheDistributionsofxi725.3.2DependentObservations735.4InstrumentalVariableandTwoStageLeastSquaresEstimation745.5Hausman’sSpecificationTestandanApplicationtoInstrumentalVariableEstimation80\nGreene-50240gree50240˙FMJuly10,200212:51Contentsxi5.6MeasurementError835.6.1LeastSquaresAttenuation845.6.2InstrumentalVariablesEstimation865.6.3ProxyVariables875.6.4Application:IncomeandEducationandaStudyofTwins885.7SummaryandConclusions90CHAPTER6InferenceandPrediction936.1Introduction936.2RestrictionsandNestedModels936.3TwoApproachestoTestingHypotheses956.3.1TheFStatisticandtheLeastSquaresDiscrepancy956.3.2TheRestrictedLeastSquaresEstimator996.3.3TheLossofFitfromRestrictedLeastSquares1016.4NonnormalDisturbancesandLargeSampleTests1046.5TestingNonlinearRestrictions1086.6Prediction1116.7SummaryandConclusions114CHAPTER7FunctionalFormandStructuralChange1167.1Introduction1167.2UsingBinaryVariables1167.2.1BinaryVariablesinRegression1167.2.2SeveralCategories1177.2.3SeveralGroupings1187.2.4ThresholdEffectsandCategoricalVariables1207.2.5SplineRegression1217.3NonlinearityintheVariables1227.3.1FunctionalForms1227.3.2IdentifyingNonlinearity1247.3.3IntrinsicLinearityandIdentification1277.4ModelingandTestingforaStructuralBreak1307.4.1DifferentParameterVectors1307.4.2InsufficientObservations1317.4.3ChangeinaSubsetofCoefficients1327.4.4TestsofStructuralBreakwithUnequalVariances1337.5TestsofModelStability1347.5.1Hansen’sTest1347.5.2RecursiveResidualsandtheCUSUMSTest1357.5.3PredictiveTest1377.5.4UnknownTimingoftheStructuralBreak1397.6SummaryandConclusions144\nGreene-50240gree50240˙FMJuly10,200212:51xiiContentsCHAPTER8SpecificationAnalysisandModelSelection1488.1Introduction1488.2SpecificationAnalysisandModelBuilding1488.2.1BiasCausedbyOmissionofRelevantVariables1488.2.2PretestEstimation1498.2.3InclusionofIrrelevantVariables1508.2.4ModelBuilding—AGeneraltoSimpleStrategy1518.3ChoosingBetweenNonnestedModels1528.3.1TestingNonnestedHypotheses1538.3.2AnEncompassingModel1548.3.3ComprehensiveApproach—TheJTest1548.3.4TheCoxTest1558.4ModelSelectionCriteria1598.5SummaryandConclusions160CHAPTER9NonlinearRegressionModels1629.1Introduction1629.2NonlinearRegressionModels1629.2.1AssumptionsoftheNonlinearRegressionModel1639.2.2TheOrthogonalityConditionandtheSumofSquares1649.2.3TheLinearizedRegression1659.2.4LargeSamplePropertiesoftheNonlinearLeastSquaresEstimator1679.2.5ComputingtheNonlinearLeastSquaresEstimator1699.3Applications1719.3.1ANonlinearConsumptionFunction1719.3.2TheBox–CoxTransformation1739.4HypothesisTestingandParametricRestrictions1759.4.1SignificanceTestsforRestrictions:FandWaldStatistics1759.4.2TestsBasedontheLMStatistic1779.4.3ASpecificationTestforNonlinearRegressions:ThePETest1789.5AlternativeEstimatorsforNonlinearRegressionModels1809.5.1NonlinearInstrumentalVariablesEstimation1819.5.2Two-StepNonlinearLeastSquaresEstimation1839.5.3Two-StepEstimationofaCreditScoringModel1869.6SummaryandConclusions189CHAPTER10NonsphericalDisturbances—TheGeneralizedRegressionModel19110.1Introduction19110.2LeastSquaresandInstrumentalVariablesEstimation19210.2.1Finite-SamplePropertiesofOrdinaryLeastSquares19310.2.2AsymptoticPropertiesofLeastSquares19410.2.3AsymptoticPropertiesofNonlinearLeastSquares196\nGreene-50240gree50240˙FMJuly10,200212:51Contentsxiii10.2.4AsymptoticPropertiesoftheInstrumentalVariablesEstimator19610.3RobustEstimationofAsymptoticCovarianceMatrices19810.4GeneralizedMethodofMomentsEstimation20110.5EfficientEstimationbyGeneralizedLeastSquares20710.5.1GeneralizedLeastSquares(GLS)20710.5.2FeasibleGeneralizedLeastSquares20910.6MaximumLikelihoodEstimation21110.7SummaryandConclusions212CHAPTER11Heteroscedasticity21511.1Introduction21511.2OrdinaryLeastSquaresEstimation21611.2.1InefficiencyofLeastSquares21711.2.2TheEstimatedCovarianceMatrixofb21711.2.3EstimatingtheAppropriateCovarianceMatrixforOrdinaryLeastSquares21911.3GMMEstimationoftheHeteroscedasticRegressionModel22111.4TestingforHeteroscedasticity22211.4.1White’sGeneralTest22211.4.2TheGoldfeld–QuandtTest22311.4.3TheBreusch–Pagan/GodfreyLMTest22311.5WeightedLeastSquaresWhenisKnown22511.6EstimationWhenContainsUnknownParameters22711.6.1Two-StepEstimation22711.6.2MaximumLikelihoodEstimation22811.6.3ModelBasedTestsforHeteroscedasticity22911.7Applications23211.7.1MultiplicativeHeteroscedasticity23211.7.2GroupwiseHeteroscedasticity23511.8AutoregressiveConditionalHeteroscedasticity23811.8.1TheARCH(1)Model23811.8.2ARCH(q),ARCH-in-MeanandGeneralizedARCHModels24011.8.3MaximumLikelihoodEstimationoftheGARCHModel24211.8.4TestingforGARCHEffects24411.8.5Pseudo-MaximumLikelihoodEstimation24511.9SummaryandConclusions246CHAPTER12SerialCorrelation25012.1Introduction25012.2TheAnalysisofTime-SeriesData25312.3DisturbanceProcesses256\nGreene-50240gree50240˙FMJuly10,200212:51xivContents12.3.1CharacteristicsofDisturbanceProcesses25612.3.2AR(1)Disturbances25712.4SomeAsymptoticResultsforAnalyzingTimeSeriesData25912.4.1ConvergenceofMoments—TheErgodicTheorem26012.4.2ConvergencetoNormality—ACentralLimitTheorem26212.5LeastSquaresEstimation26512.5.1AsymptoticPropertiesofLeastSquares26512.5.2EstimatingtheVarianceoftheLeastSquaresEstimator26612.6GMMEstimation26812.7TestingforAutocorrelation26812.7.1LagrangeMultiplierTest26912.7.2BoxandPierce’sTestandLjung’sRefinement26912.7.3TheDurbin–WatsonTest27012.7.4TestinginthePresenceofaLaggedDependentVariables27012.7.5SummaryofTestingProcedures27112.8EfficientEstimationWhenIsKnown27112.9EstimationWhenIsUnknown27312.9.1AR(1)Disturbances27312.9.2AR(2)Disturbances27412.9.3Application:EstimationofaModelwithAutocorrelation27412.9.4EstimationwithaLaggedDependentVariable27712.10CommonFactors27812.11ForecastinginthePresenceofAutocorrelation27912.12SummaryandConclusions280CHAPTER13ModelsforPanelData28313.1Introduction28313.2PanelDataModels28313.3FixedEffects28713.3.1TestingtheSignificanceoftheGroupEffects28913.3.2TheWithin-andBetween-GroupsEstimators28913.3.3FixedTimeandGroupEffects29113.3.4UnbalancedPanelsandFixedEffects29313.4RandomEffects29313.4.1GeneralizedLeastSquares29513.4.2FeasibleGeneralizedLeastSquaresWhenIsUnknown29613.4.3TestingforRandomEffects29813.4.4Hausman’sSpecificationTestfortheRandomEffectsModel30113.5InstrumentalVariablesEstimationoftheRandomEffectsModel30313.6GMMEstimationofDynamicPanelDataModels30713.7NonsphericalDisturbancesandRobustCovarianceEstimation31413.7.1RobustEstimationoftheFixedEffectsModel314\nGreene-50240gree50240˙FMJuly10,200212:51Contentsxv13.7.2HeteroscedasticityintheRandomEffectsModel31613.7.3AutocorrelationinPanelDataModels31713.8RandomCoefficientsModels31813.9CovarianceStructuresforPooledTime-SeriesCross-SectionalData32013.9.1GeneralizedLeastSquaresEstimation32113.9.2FeasibleGLSEstimation32213.9.3HeteroscedasticityandtheClassicalModel32313.9.4SpecificationTests32313.9.5Autocorrelation32413.9.6MaximumLikelihoodEstimation32613.9.7ApplicationtoGrunfeld’sInvestmentData32913.9.8Summary33313.10SummaryandConclusions334CHAPTER14SystemsofRegressionEquations33914.1Introduction33914.2TheSeeminglyUnrelatedRegressionsModel34014.2.1GeneralizedLeastSquares34114.2.2SeeminglyUnrelatedRegressionswithIdenticalRegressors34314.2.3FeasibleGeneralizedLeastSquares34414.2.4MaximumLikelihoodEstimation34714.2.5AnApplicationfromFinancialEconometrics:TheCapitalAssetPricingModel35114.2.6MaximumLikelihoodEstimationoftheSeeminglyUnrelatedRegressionsModelwithaBlockofZerosintheCoefficientMatrix35714.2.7AutocorrelationandHeteroscedasticity36014.3SystemsofDemandEquations:SingularSystems36214.3.1Cobb–DouglasCostFunction36314.3.2FlexibleFunctionalForms:TheTranslogCostFunction36614.4NonlinearSystemsandGMMEstimation36914.4.1GLSEstimation37014.4.2MaximumLikelihoodEstimation37114.4.3GMMEstimation37214.5SummaryandConclusions374CHAPTER15Simultaneous-EquationsModels37815.1Introduction37815.2FundamentalIssuesinSimultaneous-EquationsModels37815.2.1IllustrativeSystemsofEquations37815.2.2EndogeneityandCausality38115.2.3AGeneralNotationforLinearSimultaneousEquationsModels38215.3TheProblemofIdentification385\nGreene-50240gree50240˙FMJuly10,200212:51xviContents15.3.1TheRankandOrderConditionsforIdentification38915.3.2IdentificationThroughOtherNonsampleInformation39415.3.3IdentificationThroughCovarianceRestrictions—TheFullyRecursiveModel39415.4MethodsofEstimation39615.5SingleEquation:LimitedInformationEstimationMethods39615.5.1OrdinaryLeastSquares39615.5.2EstimationbyInstrumentalVariables39715.5.3Two-StageLeastSquares39815.5.4GMMEstimation40015.5.5LimitedInformationMaximumLikelihoodandthekClassofEstimators40115.5.6Two-StageLeastSquaresinModelsThatAreNonlinearinVariables40315.6SystemMethodsofEstimation40415.6.1Three-StageLeastSquares40515.6.2Full-InformationMaximumLikelihood40715.6.3GMMEstimation40915.6.4RecursiveSystemsandExactlyIdentifiedEquations41115.7ComparisonofMethods—Klein’sModelI41115.8SpecificationTests41315.9PropertiesofDynamicModels41515.9.1DynamicModelsandTheirMultipliers41515.9.2Stability41715.9.3AdjustmenttoEquilibrium41815.10SummaryandConclusions421CHAPTER16EstimationFrameworksinEconometrics42516.1Introduction42516.2ParametricEstimationandInference42716.2.1ClassicalLikelihoodBasedEstimation42816.2.2BayesianEstimation42916.2.2.aBayesianAnalysisoftheClassicalRegressionModel43016.2.2.bPointEstimation43416.2.2.cIntervalEstimation43516.2.2.dEstimationwithanInformativePriorDensity43516.2.2.eHypothesisTesting43716.2.3UsingBayesTheoreminaClassicalEstimationProblem:TheLatentClassModel43916.2.4HierarchicalBayesEstimationofaRandomParametersModelbyMarkovChainMonteCarloSimulation44416.3SemiparametricEstimation44716.3.1GMMEstimationinEconometrics44716.3.2LeastAbsoluteDeviationsEstimation448\nGreene-50240gree50240˙FMJuly10,200212:51Contentsxvii16.3.3PartiallyLinearRegression45016.3.4KernelDensityMethods45216.4NonparametricEstimation45316.4.1KernelDensityEstimation45316.4.2NonparametricRegression45716.5PropertiesofEstimators46016.5.1StatisticalPropertiesofEstimators46016.5.2ExtremumEstimators46116.5.3AssumptionsforAsymptoticPropertiesofExtremumEstimators46116.5.4AsymptoticPropertiesofEstimators46416.5.5TestingHypotheses46516.6SummaryandConclusions466CHAPTER17MaximumLikelihoodEstimation46817.1Introduction46817.2TheLikelihoodFunctionandIdentificationoftheParameters46817.3EfficientEstimation:ThePrincipleofMaximumLikelihood47017.4PropertiesofMaximumLikelihoodEstimators47217.4.1RegularityConditions47317.4.2PropertiesofRegularDensities47417.4.3TheLikelihoodEquation47617.4.4TheInformationMatrixEquality47617.4.5AsymptoticPropertiesoftheMaximumLikelihoodEstimator47617.4.5.aConsistency47717.4.5.bAsymptoticNormality47817.4.5.cAsymptoticEfficiency47917.4.5.dInvariance48017.4.5.eConclusion48017.4.6EstimatingtheAsymptoticVarianceoftheMaximumLikelihoodEstimator48017.4.7ConditionalLikelihoodsandEconometricModels48217.5ThreeAsymptoticallyEquivalentTestProcedures48417.5.1TheLikelihoodRatioTest48417.5.2TheWaldTest48617.5.3TheLagrangeMultiplierTest48917.5.4AnApplicationoftheLikelihoodBasedTestProcedures49017.6ApplicationsofMaximumLikelihoodEstimation49217.6.1TheNormalLinearRegressionModel49217.6.2MaximumLikelihoodEstimationofNonlinearRegressionModels49617.6.3NonnormalDisturbances—TheStochasticFrontierModel50117.6.4ConditionalMomentTestsofSpecification505\nGreene-50240gree50240˙FMJuly10,200212:51xviiiContents17.7Two-StepMaximumLikelihoodEstimation50817.8MaximumSimulatedLikelihoodEstimation51217.9Pseudo-MaximumLikelihoodEstimationandRobustAsymptoticCovarianceMatrices51817.10SummaryandConclusions521CHAPTER18TheGeneralizedMethodofMoments52518.1Introduction52518.2ConsistentEstimation:TheMethodofMoments52618.2.1RandomSamplingandEstimatingtheParametersofDistributions52718.2.2AsymptoticPropertiesoftheMethodofMomentsEstimator53118.2.3Summary—TheMethodofMoments53318.3TheGeneralizedMethodofMoments(GMM)Estimator53318.3.1EstimationBasedonOrthogonalityConditions53418.3.2GeneralizingtheMethodofMoments53618.3.3PropertiesoftheGMMEstimator54018.3.4GMMEstimationofSomeSpecificEconometricModels54418.4TestingHypothesesintheGMMFramework54818.4.1TestingtheValidityoftheMomentRestrictions54818.4.2GMMCounterpartstotheWald,LM,andLRTests54918.5Application:GMMEstimationofaDynamicPanelDataModelofLocalGovernmentExpenditures55118.6SummaryandConclusions555CHAPTER19ModelswithLaggedVariables55819.1Introduction55819.2DynamicRegressionModels55919.2.1LaggedEffectsinaDynamicModel56019.2.2TheLagandDifferenceOperators56219.2.3SpecificationSearchfortheLagLength56419.3SimpleDistributedLagModels56519.3.1FiniteDistributedLagModels56519.3.2AnInfiniteLagModel:TheGeometricLagModel56619.4AutoregressiveDistributedLagModels57119.4.1EstimationoftheARDLModel57219.4.2ComputationoftheLagWeightsintheARDLModel57319.4.3StabilityofaDynamicEquation57319.4.4Forecasting57619.5MethodologicalIssuesintheAnalysisofDynamicModels57919.5.1AnErrorCorrectionModel57919.5.2Autocorrelation581\nGreene-50240gree50240˙FMJuly10,200212:51Contentsxix19.5.3SpecificationAnalysis58219.5.4CommonFactorRestrictions58319.6VectorAutoregressions58619.6.1ModelForms58719.6.2Estimation58819.6.3TestingProcedures58919.6.4Exogeneity59019.6.5TestingforGrangerCausality59219.6.6ImpulseResponseFunctions59319.6.7StructuralVARs59519.6.8Application:PolicyAnalysiswithaVAR59619.6.9VARsinMicroeconomics60219.7SummaryandConclusions605CHAPTER20Time-SeriesModels60820.1Introduction60820.2StationaryStochasticProcesses60920.2.1AutoregressiveMoving-AverageProcesses60920.2.2StationarityandInvertibility61120.2.3AutocorrelationsofaStationaryStochasticProcess61420.2.4PartialAutocorrelationsofaStationaryStochasticProcess61720.2.5ModelingUnivariateTimeSeries61920.2.6EstimationoftheParametersofaUnivariateTimeSeries62120.2.7TheFrequencyDomain62420.2.7.aTheoreticalResults62520.2.7.bEmpiricalCounterparts62720.3NonstationaryProcessesandUnitRoots63120.3.1IntegratedProcessesandDifferencing63120.3.2RandomWalks,Trends,andSpuriousRegressions63220.3.3TestsforUnitRootsinEconomicData63620.3.4TheDickey–FullerTests63720.3.5LongMemoryModels64720.4Cointegration64920.4.1CommonTrends65320.4.2ErrorCorrectionandVARRepresentations65420.4.3TestingforCointegration65520.4.4EstimatingCointegrationRelationships65720.4.5Application:GermanMoneyDemand65720.4.5.aCointegrationAnalysisandaLongRunTheoreticalModel65920.4.5.bTestingforModelInstability65920.5SummaryandConclusions660\nGreene-50240gree50240˙FMJuly10,200212:51xxContentsCHAPTER21ModelsforDiscreteChoice66321.1Introduction66321.2DiscreteChoiceModels66321.3ModelsforBinaryChoice66521.3.1TheRegressionApproach66521.3.2LatentRegression—IndexFunctionModels66821.3.3RandomUtilityModels67021.4EstimationandInferenceinBinaryChoiceModels67021.4.1RobustCovarianceMatrixEstimation67321.4.2MarginalEffects67421.4.3HypothesisTests67621.4.4SpecificationTestsforBinaryChoiceModels67921.4.4.aOmittedVariables68021.4.4.bHeteroscedasticity68021.4.4.cASpecificationTestforNonnestedModels—TestingfortheDistribution68221.4.5MeasuringGoodnessofFit68321.4.6AnalysisofProportionsData68621.5ExtensionsoftheBinaryChoiceModel68921.5.1RandomandFixedEffectsModelsforPanelData68921.5.1.aRandomEffectsModels69021.5.1.bFixedEffectsModels69521.5.2SemiparametricAnalysis70021.5.3TheMaximumScoreEstimator(MSCORE)70221.5.4SemiparametricEstimation70421.5.5AKernelEstimatorforaNonparametricRegressionFunction70621.5.6DynamicBinaryChoiceModels70821.6BivariateandMultivariateProbitModels71021.6.1MaximumLikelihoodEstimation71021.6.2TestingforZeroCorrelation71221.6.3MarginalEffects71221.6.4SampleSelection71321.6.5AMultivariateProbitModel71421.6.6Application:GenderEconomicsCoursesinLiberalArtsColleges71521.7LogitModelsforMultipleChoices71921.7.1TheMultinomialLogitModel72021.7.2TheConditionalLogitModel72321.7.3TheIndependencefromIrrelevantAlternatives72421.7.4NestedLogitModels72521.7.5AHeteroscedasticLogitModel72721.7.6MultinomialModelsBasedontheNormalDistribution72721.7.7ARandomParametersModel728\nGreene-50240gree50240˙FMJuly10,200212:51Contentsxxi21.7.8Application:ConditionalLogitModelforTravelModeChoice72921.8OrderedData73621.9ModelsforCountData74021.9.1MeasuringGoodnessofFit74121.9.2TestingforOverdispersion74321.9.3HeterogeneityandtheNegativeBinomialRegressionModel74421.9.4Application:ThePoissonRegressionModel74521.9.5PoissonModelsforPanelData74721.9.6HurdleandZero-AlteredPoissonModels74921.10SummaryandConclusions752CHAPTER22LimitedDependentVariableandDurationModels75622.1Introduction75622.2Truncation75622.2.1TruncatedDistributions75722.2.2MomentsofTruncatedDistributions75822.2.3TheTruncatedRegressionModel76022.3CensoredData76122.3.1TheCensoredNormalDistribution76222.3.2TheCensoredRegression(Tobit)Model76422.3.3Estimation76622.3.4SomeIssuesinSpecification76822.3.4.aHeteroscedasticity76822.3.4.bMisspecificationofProb[y*<0]77022.3.4.cNonnormality77122.3.4.dConditionalMomentTests77222.3.5CensoringandTruncationinModelsforCounts77322.3.6Application:CensoringintheTobitandPoissonRegressionModels77422.4TheSampleSelectionModel78022.4.1IncidentalTruncationinaBivariateDistribution78122.4.2RegressioninaModelofSelection78222.4.3Estimation78422.4.4TreatmentEffects78722.4.5TheNormalityAssumption78922.4.6SelectioninQualitativeResponseModels79022.5ModelsforDurationData79022.5.1DurationData79122.5.2ARegression-LikeApproach:ParametricModelsofDuration79222.5.2.aTheoreticalBackground79222.5.2.bModelsoftheHazardFunction79322.5.2.cMaximumLikelihoodEstimation794\nGreene-50240gree50240˙FMJuly10,200212:51xxiiContents22.5.2.dExogenousVariables79622.5.2.eHeterogeneity79722.5.3OtherApproaches79822.6SummaryandConclusions801APPENDIXAMatrixAlgebra803A.1Terminology803A.2AlgebraicManipulationofMatrices803A.2.1EqualityofMatrices803A.2.2Transposition804A.2.3MatrixAddition804A.2.4VectorMultiplication805A.2.5ANotationforRowsandColumnsofaMatrix805A.2.6MatrixMultiplicationandScalarMultiplication805A.2.7SumsofValues807A.2.8AUsefulIdempotentMatrix808A.3GeometryofMatrices809A.3.1VectorSpaces809A.3.2LinearCombinationsofVectorsandBasisVectors811A.3.3LinearDependence811A.3.4Subspaces813A.3.5RankofaMatrix814A.3.6DeterminantofaMatrix816A.3.7ALeastSquaresProblem817A.4SolutionofaSystemofLinearEquations819A.4.1SystemsofLinearEquations819A.4.2InverseMatrices820A.4.3NonhomogeneousSystemsofEquations822A.4.4SolvingtheLeastSquaresProblem822A.5PartitionedMatrices822A.5.1AdditionandMultiplicationofPartitionedMatrices823A.5.2DeterminantsofPartitionedMatrices823A.5.3InversesofPartitionedMatrices823A.5.4DeviationsfromMeans824A.5.5KroneckerProducts824A.6CharacteristicRootsandVectors825A.6.1TheCharacteristicEquation825A.6.2CharacteristicVectors826A.6.3GeneralResultsforCharacteristicRootsandVectors826A.6.4DiagonalizationandSpectralDecompositionofaMatrix827A.6.5RankofaMatrix827A.6.6ConditionNumberofaMatrix829A.6.7TraceofaMatrix829A.6.8DeterminantofaMatrix830A.6.9PowersofaMatrix830\nGreene-50240gree50240˙FMJuly10,200212:51ContentsxxiiiA.6.10IdempotentMatrices832A.6.11FactoringaMatrix832A.6.12TheGeneralizedInverseofaMatrix833A.7QuadraticFormsandDefiniteMatrices834A.7.1NonnegativeDefiniteMatrices835A.7.2IdempotentQuadraticForms836A.7.3ComparingMatrices836A.8CalculusandMatrixAlgebra837A.8.1DifferentiationandtheTaylorSeries837A.8.2Optimization840A.8.3ConstrainedOptimization842A.8.4Transformations844APPENDIXBProbabilityandDistributionTheory845B.1Introduction845B.2RandomVariables845B.2.1ProbabilityDistributions845B.2.2CumulativeDistributionFunction846B.3ExpectationsofaRandomVariable847B.4SomeSpecificProbabilityDistributions849B.4.1TheNormalDistribution849B.4.2TheChi-Squared,t,andFDistributions851B.4.3DistributionsWithLargeDegreesofFreedom853B.4.4SizeDistributions:TheLognormalDistribution854B.4.5TheGammaandExponentialDistributions855B.4.6TheBetaDistribution855B.4.7TheLogisticDistribution855B.4.8DiscreteRandomVariables855B.5TheDistributionofaFunctionofaRandomVariable856B.6RepresentationsofaProbabilityDistribution858B.7JointDistributions860B.7.1MarginalDistributions860B.7.2ExpectationsinaJointDistribution861B.7.3CovarianceandCorrelation861B.7.4DistributionofaFunctionofBivariateRandomVariables862B.8ConditioninginaBivariateDistribution864B.8.1Regression:TheConditionalMean864B.8.2ConditionalVariance865B.8.3RelationshipsAmongMarginalandConditionalMoments865B.8.4TheAnalysisofVariance867B.9TheBivariateNormalDistribution867B.10MultivariateDistributions868B.10.1Moments868\nGreene-50240gree50240˙FMJuly10,200212:51xxivContentsB.10.2SetsofLinearFunctions869B.10.3NonlinearFunctions870B.11TheMultivariateNormalDistribution871B.11.1MarginalandConditionalNormalDistributions871B.11.2TheClassicalNormalLinearRegressionModel872B.11.3LinearFunctionsofaNormalVector873B.11.4QuadraticFormsinaStandardNormalVector873B.11.5TheFDistribution875B.11.6AFullRankQuadraticForm875B.11.7IndependenceofaLinearandaQuadraticForm876APPENDIXCEstimationandInference877C.1Introduction877C.2SamplesandRandomSampling878C.3DescriptiveStatistics878C.4StatisticsasEstimators—SamplingDistributions882C.5PointEstimationofParameters885C.5.1EstimationinaFiniteSample885C.5.2EfficientUnbiasedEstimation888C.6IntervalEstimation890C.7HypothesisTesting892C.7.1ClassicalTestingProcedures892C.7.2TestsBasedonConfidenceIntervals895C.7.3SpecificationTests896APPENDIXDLargeSampleDistributionTheory896D.1Introduction896D.2Large-SampleDistributionTheory897D.2.1ConvergenceinProbability897D.2.2OtherFormsofConvergenceandLawsofLargeNumbers900D.2.3ConvergenceofFunctions903D.2.4ConvergencetoaRandomVariable904D.2.5ConvergenceinDistribution:LimitingDistributions906D.2.6CentralLimitTheorems908D.2.7TheDeltaMethod913D.3AsymptoticDistributions914D.3.1AsymptoticDistributionofaNonlinearFunction916D.3.2AsymptoticExpectations917D.4SequencesandtheOrderofaSequence918APPENDIXEComputationandOptimization919E.1Introduction919E.2DataInputandGeneration920E.2.1GeneratingPseudo-RandomNumbers920\nGreene-50240gree50240˙FMJuly10,200212:51ContentsxxvE.2.2SamplingfromaStandardUniformPopulation921E.2.3SamplingfromContinuousDistributions921E.2.4SamplingfromaMultivariateNormalPopulation922E.2.5SamplingfromaDiscretePopulation922E.2.6TheGibbsSampler922E.3MonteCarloStudies923E.4BootstrappingandtheJackknife924E.5ComputationinEconometrics925E.5.1ComputingIntegrals926E.5.2TheStandardNormalCumulativeDistributionFunction926E.5.3TheGammaandRelatedFunctions927E.5.4ApproximatingIntegralsbyQuadrature928E.5.5MonteCarloIntegration929E.5.6MultivariateNormalProbabilitiesandSimulatedMoments931E.5.7ComputingDerivatives933E.6Optimization933E.6.1Algorithms935E.6.2GradientMethods935E.6.3AspectsofMaximumLikelihoodEstimation939E.6.4OptimizationwithConstraints941E.6.5SomePracticalConsiderations942E.6.6Examples943APPENDIXFDataSetsUsedinApplications946APPENDIXGStatisticalTables953References959AuthorIndex000SubjectIndex000\nGreene-50240gree50240˙FMJuly10,200212:51PREFACEQ1.THEFIFTHEDITIONOFECONOMETRICANALYSISEconometricAnalysisisintendedforaone-yeargraduatecourseineconometricsforsocialscientists.Theprerequisitesforthiscourseshouldincludecalculus,mathematicalstatistics,andanintroductiontoeconometricsatthelevelof,say,Gujarati’sBasicEcono-metrics(McGraw-Hill,1995)orWooldridge’sIntroductoryEconometrics:AModernApproach[South-Western(2000)].Self-contained(forourpurposes)summariesofthematrixalgebra,mathematicalstatistics,andstatisticaltheoryusedlaterinthebookaregiveninAppendicesAthroughD.AppendixEcontainsadescriptionofnumericalmethodsthatwillbeusefultopracticingeconometricians.Theformalpresentationofeconometricsbeginswithdiscussionofafundamentalpillar,thelinearmultipleregres-sionmodel,inChapters2through8.Chapters9through15presentfamiliarextensionsofthesinglelinearequationmodel,includingnonlinearregression,paneldatamodels,thegeneralizedregressionmodel,andsystemsofequations.Thelinearmodelisusuallynotthesoletechniqueusedinmostofthecontemporaryliterature.Inviewofthis,the(expanding)secondhalfofthisbookisdevotedtotopicsthatwillextendthelinearregressionmodelinmanydirections.Chapters16through18presentthetechniquesandunderlyingtheoryofestimationineconometrics,includingGMMandmaximumlikelihoodestimationmethodsandsimulationbasedtechniques.Weendinthelastfourchapters,19through22,withdiscussionsofcurrenttopicsinappliedeconometrics,in-cludingtime-seriesanalysisandtheanalysisofdiscretechoiceandlimiteddependentvariablemodels.Thisbookhastwoobjectives.Thefirstistointroducestudentstoappliedecono-metrics,includingbasictechniquesinregressionanalysisandsomeoftherichvarietyofmodelsthatareusedwhenthelinearmodelprovesinadequateorinappropriate.Thesecondistopresentstudentswithsufficienttheoreticalbackgroundthattheywillrecognizenewvariantsofthemodelslearnedabouthereasmerelynaturalextensionsthatfitwithinacommonbodyofprinciples.Thus,IhavespentwhatmightseemtobealargeamountofeffortexplainingthemechanicsofGMMestimation,nonlinearleastsquares,andmaximumlikelihoodestimationandGARCHmodels.Tomeetthesecondobjective,thisbookalsocontainsafairamountoftheoreticalmaterial,suchasthatonmaximumlikelihoodestimationandonasymptoticresultsforregressionmodels.Mod-ernsoftwarehasmadecomplicatedmodelingveryeasytodo,andanunderstandingoftheunderlyingtheoryisimportant.Ihadseveralpurposesinundertakingthisrevision.Asinthepast,readerscontinuetosendmeinterestingideasformy“nextedition.”Itisimpossibletousethemall,ofxxvii\nGreene-50240gree50240˙FMJuly10,200212:51xxviiiPrefacecourse.BecausethefivevolumesoftheHandbookofEconometricsandtwooftheHandbookofAppliedEconometricsalreadyruntoover4,000pages,itisalsounneces-sary.Nonetheless,thisrevisionisappropriateforseveralreasons.First,therearenewandinterestingdevelopmentsinthefield,particularlyintheareasofmicroeconometrics(paneldata,modelsfordiscretechoice)and,ofcourse,intimeseries,whichcontinuesitsrapiddevelopment.Second,Ihavetakentheopportunitytocontinuefine-tuningthetextastheexperienceandsharedwisdomofmyreadersaccumulatesinmyfiles.Forthisrevision,thatadjustmenthasentailedasubstantialrearrangementofthematerial—themainpurposeofthatwastoallowmetoaddthenewmaterialinamorecompactandorderlywaythanIcouldhavewiththetableofcontentsinthe4thedition.Thelitera-tureineconometricshascontinuedtoevolve,andmythirdobjectiveistogrowwithit.Thispurposeisinherentlydifficulttoaccomplishinatextbook.Mostoftheliteratureiswrittenbyprofessionalsforotherprofessionals,andthistextbookiswrittenforstudentswhoareintheearlystagesoftheirtraining.ButIdohopetoprovideabridgetothatliterature,boththeoreticalandapplied.Thisbookisabroadsurveyofthefieldofeconometrics.Thisfieldgrowscon-tinually,andsuchaneffortbecomesincreasinglydifficult.(Apartiallistofjournalsdevotedatleastinpart,ifnotcompletely,toeconometricsnowincludestheJournalofAppliedEconometrics,JournalofEconometrics,EconometricTheory,EconometricReviews,JournalofBusinessandEconomicStatistics,EmpiricalEconomics,andEcono-metrica.)Still,myviewhasalwaysbeenthattheseriousstudentofthefieldmuststartsomewhere,andonecansuccessfullyseekthatobjectiveinasingletextbook.Thistextattemptstosurvey,atanentrylevel,enoughofthefieldsineconometricsthatastudentcancomfortablymovefromheretopracticeormoreadvancedstudyinoneormorespecializedareas.Atthesametime,Ihavetriedtopresentthematerialinsufficientgeneralitythatthereaderisalsoabletoappreciatetheimportantcommonfoundationofallthesefieldsandtousethetoolsthattheyallemploy.Therearenowquiteafewrecentlypublishedtextsineconometrics.Severalhavegatheredincompact,eleganttreatises,theincreasinglyadvancedandadvancingtheo-reticalbackgroundofeconometrics.Others,suchasthisbook,focusmoreattentiononapplicationsofeconometrics.Onefeaturethatdistinguishesthisworkfromitsprede-cessorsisitsgreateremphasisonnonlinearmodels.[DavidsonandMacKinnon(1993)isanoteworthy,butmoreadvanced,exception.]Computersoftwarenowinwideusehasmadeestimationofnonlinearmodelsasroutineasestimationoflinearones,andtherecentliteraturereflectsthatprogression.Mypurposeistoprovideatextbooktreat-mentthatisinlinewithcurrentpractice.Thebookconcludeswithfourlengthychaptersontime-seriesanalysis,discretechoicemodelsandlimiteddependentvariablemodels.Thesenonlinearmodelsarenowthestaplesoftheappliedeconometricsliterature.Thisbookalsocontainsafairamountofmaterialthatwillextendbeyondmanyfirstcoursesineconometrics,including,perhaps,theaforementionedchaptersonlimiteddependentvariables,thesectioninChapter22ondurationmodels,andsomeofthediscussionsoftimeseriesandpaneldatamodels.Onceagain,Ihaveincludedtheseinthehopeofprovidingabridgetotheprofessionalliteratureintheseareas.Ihavehadoneoverridingpurposethathasmotivatedallfiveeditionsofthiswork.Forthevastmajorityofreadersofbookssuchasthis,whoseambitionistouse,notdevelopeconometrics,Ibelievethatitissimplynotsufficienttorecitethetheoryofestimation,hypothesistestingandeconometricanalysis.Understandingtheoftensubtle\nGreene-50240gree50240˙FMJuly10,200212:51Prefacexxixbackgroundtheoryisextremelyimportant.But,attheendoftheday,mypurposeinwritingthiswork,andformycontinuingeffortstoupdateitinthisnowfifthedition,istoshowreadershowtodoeconometricanalysis.Iunabashedlyaccepttheunflatter-ingassessmentofacorrespondentwhooncelikenedthisbooktoa“user’sguidetoeconometrics.”2.SOFTWAREANDDATATherearemanycomputerprogramsthatarewidelyusedforthecomputationsdescribedinthisbook.Allwerewrittenbyeconometriciansorstatisticians,andingeneral,allareregularlyupdatedtoincorporatenewdevelopmentsinappliedeconometrics.AsamplingofthemostwidelyusedpackagesandInternethomepageswhereyoucanfindinformationaboutthemare:E-Viewswww.eviews.com(QMS,Irvine,Calif.)Gausswww.aptech.com(AptechSystems,Kent,Wash.)LIMDEPwww.limdep.com(EconometricSoftware,Plainview,N.Y.)RATSwww.estima.com(Estima,Evanston,Ill.)SASwww.sas.com(SAS,Cary,N.C.)Shazamshazam.econ.ubc.ca(KenWhite,UBC,Vancouver,B.C.)Statawww.stata.com(Stata,CollegeStation,Tex.)TSPwww.tspintl.com(TSPInternational,Stanford,Calif.)Programsvaryinsize,complexity,cost,theamountofprogrammingrequiredoftheuser,andsoon.JournalssuchasTheAmericanStatistician,TheJournalofAppliedEcono-metrics,andTheJournalofEconomicSurveysregularlypublishreviewsofindividualpackagesandcomparativesurveysofpackages,usuallywithreferencetoparticularfunctionalitysuchaspaneldataanalysisorforecasting.Withonlyafewexceptions,thecomputationsdescribedinthisbookcanbecarriedoutwithanyofthesepackages.Wehesitatetolinkthistexttoanyoftheminpartic-ular.WehaveplacedforgeneralaccessacustomizedversionofLIMDEP,whichwasalsowrittenbytheauthor,onthewebsiteforthistext,http://www.stern.nyu.edu/∼wgreene/Text/econometricanalysis.htm.LIMDEPprogramsusedformanyofthecomputationsarepostedonthesitesaswell.Thedatasetsusedintheexamplesarealsoonthewebsite.Throughoutthetext,thesedatasetsarereferredto“TableFn.m,”forexampleTableF4.1.TheFreferstoAppendixFatthebackofthetext,whichcontainsdescriptionsofthedatasets.Theactualdataarepostedonthewebsitewiththeothersupplementarymaterialsforthetext.(Thedatasetsarealsoreplicatedinthesystemformatofmostofthecommonlyusedeconometricscomputerprograms,includinginadditiontoLIMDEP,SAS,TSP,SPSS,E-Views,andStata,sothatyoucaneasilyimportthemintowhateverprogramyoumightbeusing.)Ishouldalsonote,therearenowthousandsofinterestingwebsitescontainingsoft-ware,datasets,papers,andcommentaryoneconometrics.Itwouldbehopelesstoattemptanykindofasurveyhere.But,Idonoteonewhichisparticularlyagree-ablystructuredandwelltargetedforreadersofthisbook,thedataarchiveforthe\nGreene-50240gree50240˙FMJuly10,200212:51xxxPrefaceJournalofAppliedEconometrics.Thisjournalpublishesmanypapersthatarepreciselyattherightlevelforreadersofthistext.Theyhavearchivedallthenonconfidentialdatasetsusedintheirpublicationssince1994.Thisusefularchivecanbefoundathttp://qed.econ.queensu.ca/jae/.3.ACKNOWLEDGEMENTSItisapleasuretoexpressmyappreciationtothosewhohaveinfluencedthiswork.IamgratefultoArthurGoldbergerandArnoldZellnerfortheirencouragement,guidance,andalwaysinterestingcorrespondence.DennisAignerandLauritsChristensenwerealsoinfluentialinshapingmyviewsoneconometrics.SomecollaboratorstotheearliereditionswhosecontributionsremaininthisoneincludeAlineQuester,DavidHensher,andDonaldWaldman.Thenumberofstudentsandcolleagueswhosesuggestionshavehelpedtoproducewhatyoufindhereisfartoolargetoallowmetothankthemallindividually.Iwouldliketoacknowledgethemanyreviewersofmyworkwhosecare-fulreadinghasvastlyimprovedthebook:BadiBaltagi,UniversityofHouston:NealBeck,UniversityofCaliforniaatSanDiego;DianeBelleville,ColumbiaUniversity;AnilBera,UniversityofIllinois;JohnBurkett,UniversityofRhodeIsland;LeonardCarlson,EmoryUniversity;FrankChaloupka,CityUniversityofNewYork;ChrisCornwell,UniversityofGeorgia;MitaliDas,ColumbiaUniversity;CraigDepkenII,UniversityofTexasatArlington;EdwardDwyer,ClemsonUniversity;MichaelEllis,WesleyanUniversity;MartinEvans,NewYorkUniversity;EdGreenberg,WashingtonUniversityatSt.Louis;MiguelHerce,UniversityofNorthCarolina;K.RaoKadiyala,PurdueUniversity;TongLi,IndianaUniversity;LubomirLitov,NewYorkUniversity;WilliamLott,UniversityofConnecticut;EdwardMathis,VillanovaUniversity;MaryMcGarvey,UniversityofNebraska-Lincoln;EdMelnick,NewYorkUniversity;ThadMirer,StateUniversityofNewYorkatAlbany;PaulRuud,UniversityofCaliforniaatBerkeley;SherrieRhine,ChicagoFederalReserveBoard;TerryG.Seaks,UniversityofNorthCarolinaatGreensboro;DonaldSnyder,CaliforniaStateUniversityatLosAngeles;StevenStern,UniversityofVirginia;HoustonStokes,UniversityofIllinoisatChicago;DimitriosThomakos,FloridaInternationalUniversity;PaulWachtel,NewYorkUniversity;MarkWatson,HarvardUniversity;andKennethWest,UniversityofWisconsin.MynumerousdiscussionswithB.D.McCulloughhaveimprovedAp-pendixEandatthesametimeincreasedmyappreciationfornumericalanalysis.IamespeciallygratefultoJanKivietoftheUniversityofAmsterdam,whosubjectedmythirdeditiontoamicroscopicexaminationandprovidedliterallyscoresofsugges-tions,virtuallyallofwhichappearherein.Chapters19and20havealsobenefitedfrompreviousreviewsbyFrankDiebold,B.D.McCullough,MaryMcGarvey,andNageshRevankar.IwouldalsoliketothankRodBanister,GladysSoto,CindyRegan,MikeReynolds,MarieMcHale,LisaAmato,andTorieAndersonatPrenticeHallfortheircontributionstothecompletionofthisbook.Asalways,Iowethegreatestdebttomywife,Lynne,andtomydaughters,Lesley,Allison,Elizabeth,andJulianna.WilliamH.Greene\nGreene-50240bookMay24,200210:361INTRODUCTIONQ1.1ECONOMETRICSInthefirstissueofEconometrica,theEconometricSocietystatedthatitsmainobjectshallbetopromotestudiesthataimataunificationofthetheoretical-quantitativeandtheempirical-quantitativeapproachtoeconomicproblemsandthatarepenetratedbyconstructiveandrigorousthinkingsimilartothatwhichhascometodominatethenaturalsciences.Butthereareseveralaspectsofthequantitativeapproachtoeconomics,andnosingleoneoftheseaspectstakenbyitself,shouldbeconfoundedwithecono-metrics.Thus,econometricsisbynomeansthesameaseconomicstatistics.Norisitidenticalwithwhatwecallgeneraleconomictheory,althoughaconsider-ableportionofthistheoryhasadefinitelyquantitativecharacter.Norshouldeconometricsbetakenassynonomous[sic]withtheapplicationofmathematicstoeconomics.Experiencehasshownthateachofthesethreeviewpoints,thatofstatistics,economictheory,andmathematics,isanecessary,butnotbyitselfasufficient,conditionforarealunderstandingofthequantitativerelationsinmoderneconomiclife.Itistheunificationofallthreethatispowerful.Anditisthisunificationthatconstituteseconometrics.Frisch(1933)andhissocietyrespondedtoanunprecedentedaccumulationofstatisti-calinformation.Theysawaneedtoestablishabodyofprinciplesthatcouldorganizewhatwouldotherwisebecomeabewilderingmassofdata.Neitherthepillarsnortheobjectivesofeconometricshavechangedintheyearssincethiseditorialappeared.Econometricsisthefieldofeconomicsthatconcernsitselfwiththeapplicationofmath-ematicalstatisticsandthetoolsofstatisticalinferencetotheempiricalmeasurementofrelationshipspostulatedbyeconomictheory.1.2ECONOMETRICMODELINGEconometricanalysiswillusuallybeginwithastatementofatheoreticalproposition.Consider,forexample,acanonicalapplication:Example1.1Keynes’sConsumptionFunctionFromKeynes’s(1936)GeneralTheoryofEmployment,InterestandMoney:Weshallthereforedefinewhatweshallcallthepropensitytoconsumeasthefunc-tionalrelationshipfbetweenX,agivenlevelofincomeandC,theexpenditureonconsumptionoutofthelevelofincome,sothatC=f(X).Theamountthatthecommunityspendsonconsumptiondepends(i)partlyontheamountofitsincome,(ii)partlyonotherobjectiveattendantcircumstances,and1\nGreene-50240bookMay24,200210:362CHAPTER1✦Introduction(iii)partlyonthesubjectiveneedsandthepsychologicalpropensitiesandhabitsoftheindividualscomposingit.Thefundamentalpsychologicallawuponwhichweareentitledtodependwithgreatconfidence,bothapriorifromourknowledgeofhumannatureandfromthedetailedfactsofexperience,isthatmenaredisposed,asaruleandontheaverage,toincreasetheirconsumptionastheirincomeincreases,butnotbyasmuchastheincreaseintheirincome.1Thatis,...dC/dXispositiveandlessthanunity.But,apartfromshortperiodchangesinthelevelofincome,itisalsoobviousthatahigherabsolutelevelofincomewilltendasaruletowidenthegapbetweenincomeandconsumption....Thesereasonswilllead,asarule,toagreaterproportionofincomebeingsavedasrealincomeincreases.Thetheoryassertsarelationshipbetweenconsumptionandincome,C=f(X),andclaimsinthethirdparagraphthatthemarginalpropensitytoconsume(MPC),dC/dX,isbetween0and1.Thefinalparagraphassertsthattheaveragepropensitytoconsume(APC),C/X,fallsasincomerises,ord(C/X)/dX=(MPC−APC)/X<0.ItfollowsthatMPCtα/2,wheretα/2isthe100(1−α/2)percentcriticalvaluefromthetdistributionwith(n−K)degreesoffreedom,thenthehypothesisisrejectedandthecoefficientissaidtobe“statisticallysignificant.”Thevalueof1.96,whichwouldapplyforthe5percentsignificancelevelinalargesample,isoftenusedasabenchmarkvaluewhenatableofcriticalvaluesisnotimmediatelyavailable.Thetratioforthetestofthehypothesisthatacoefficientequalszeroisastandardpartoftheregressionoutputofmostcomputerprograms.Example4.3EarningsEquationAppendixTableF4.1contains753observationsusedinMroz’s(1987)studyoflaborsupplybehaviorofmarriedwomen.Wewillusethesedataatseveralpointsbelow.Ofthe753indi-vidualsinthesample,428wereparticipantsintheformallabormarket.Fortheseindividuals,wewillfitasemilogearningsequationoftheformsuggestedinExample2.2;2lnearnings=β1+β2age+β3age+β4education+β5kids+ε,whereearningsishourlywagetimeshoursworked,educationismeasuredinyearsofschool-ingandkidsisabinaryvariablewhichequalsoneiftherearechildrenunder18inthehouse-hold.(SeethedatadescriptioninAppendixFfordetails.)RegressionresultsareshowninTable4.2.Thereare428observationsand5parameters,sothetstatisticshave423degrees2See(B-36)inSectionB.4.2.Itistheratioofastandardnormalvariabletothesquarerootofachi-squaredvariabledividedbyitsdegreesoffreedom.\nGreene-50240bookJune3,20029:5752CHAPTER4✦Finite-SamplePropertiesoftheLeastSquaresEstimatorTABLE4.2RegressionResultsforanEarningsEquationSumofsquaredresiduals:599.4582Standarderroroftheregression:1.19044R2basedon428observations0.040995VariableCoefficientStandardErrortRatioConstant3.240091.76741.833Age0.200560.083862.392Age2−0.00231470.00098688−2.345Education0.0674720.0252482.672Kids−0.351190.14753−2.380EstimatedCovarianceMatrixforb(e−n=times10−n)ConstantAgeAge2EducationKids3.12381−0.144090.00703250.0016617−8.23237e−59.73928e−7−0.00926095.08549e−5−4.96761e−70.000637290.026749−0.00264123.84102e−5−5.46193e−50.021766offreedom.For95percentsignificancelevels,thestandardnormalvalueof1.96isappropri-atewhenthedegreesoffreedomarethislarge.Bythismeasure,allvariablesarestatisticallysignificantandsignsareconsistentwithexpectations.ItwillbeinterestingtoinvestigatewhethertheeffectofKidsisonthewageorhours,orboth.Weinterprettheschoolingvari-abletoimplythatanadditionalyearofschoolingisassociatedwitha6.7percentincreaseinearnings.Thequadraticageprofilesuggeststhatforagiveneducationlevelandfamilysize,earningsrisetothepeakat−b2/(2b3)whichisabout43yearsofage,atwhichtheybegintodecline.Somepointstonote:(1)Ourselectionofonlythoseindividualswhohadposi-tivehoursworkedisnotaninnocentsampleselectionmechanism.Sinceindividualschosewhetherornottobeinthelaborforce,itislikely(almostcertain)thatearningspotentialwasasignificantfactor,alongwithsomeotheraspectswewillconsiderinChapter22.(2)Theearningsequationisamixtureofalaborsupplyequation—hoursworkedbytheindividual,andalabordemandoutcome—thewageis,presumably,anacceptedoffer.Assuch,itisunclearwhattheprecisenatureofthisequationis.Presumably,itisahashoftheequationsofanelaboratestructuralequationsystem.4.7.2CONFIDENCEINTERVALSFORPARAMETERSAconfidenceintervalforβkwouldbebasedon(4-13).WecouldsaythatProb(bk−tα/2sbk≤βk≤bk+tα/2sbk)=1−α,where1−αisthedesiredlevelofconfidenceandtα/2istheappropriatecriticalvaluefromthetdistributionwith(n−K)degreesoffreedom.Example4.4ConfidenceIntervalfortheIncomeElasticityofDemandforGasolineUsingthegasolinemarketdatadiscussedinExample2.3,weestimatedfollowingdemandequationusingthe36observations.Estimatedstandarderrors,computedasshownabove,\nGreene-50240bookJune3,20029:57CHAPTER4✦Finite-SamplePropertiesoftheLeastSquaresEstimator53aregiveninparenthesesbelowtheleastsquaresestimates.ln(G/pop)=−7.737−0.05910lnPG+1.3733lnincome(0.6749)(0.03248)(0.075628)−0.12680lnPnc−0.11871lnPuc+e.(0.12699)(0.081337)Toformaconfidenceintervalfortheincomeelasticity,weneedthecriticalvaluefromthetdistributionwithn−K=36−5degreesoffreedom.The95percentcriticalvalueis2.040.Therefore,a95percentconfidenceintervalforβIis1.3733±2.040(0.075628),or[1.2191,1.5276].Weareinterestedinwhetherthedemandforgasolineisincomeinelastic.ThehypothesistobetestedisthatβIislessthan1.Foraone-sidedtest,weadjustthecriticalregionandusethetαcriticalpointfromthedistribution.Valuesofthesampleestimatethataregreatlyinconsistentwiththehypothesiscastdoubtuponit.ConsidertestingthehypothesisH0:βI<1versusH1:βI≥1.Theappropriateteststatisticis1.3733−1t==4.936.0.075628Thecriticalvaluefromthetdistributionwith31degreesoffreedomis2.04,whichisfarlessthan4.936.Weconcludethatthedataarenotconsistentwiththehypothesisthattheincomeelasticityislessthan1,sowerejectthehypothesis.4.7.3CONFIDENCEINTERVALFORALINEARCOMBINATIONOFCOEFFICIENTS:THEOAXACADECOMPOSITIONWithnormallydistributeddisturbances,theleastsquarescoefficientestimator,b,isnormallydistributedwithmeanβandcovariancematrixσ2(XX)−1.InExample4.4,weshowedhowtousethisresulttoformaconfidenceintervalforoneoftheelementsofβ.Byextendingthoseresults,wecanshowhowtoformaconfidenceintervalforalinearfunctionoftheparameters.Oaxaca’s(1973)decompositionprovidesafrequentlyusedapplication.LetwdenoteaK×1vectorofknownconstants.Then,thelinearcombinationc=wbisnormallydistributedwithmeanγ=wβandvarianceσ2=w[σ2(XX)−1]w,cwhichweestimatewiths2=w[s2(XX)−1]w.Withtheseinhand,wecanusetheearliercresultstoformaconfidenceintervalforγ:Prob[c−tα/2sc≤γ≤c+tα/2sc]=1−α.Thisgeneralresultcanbeused,forexample,forthesumofthecoefficientsorforadifference.Consider,then,Oaxaca’sapplication.Inastudyoflaborsupply,separatewageregressionsarefitforsamplesofnmmenandnfwomen.Theunderlyingregressionmodelsarelnwage=xβ+ε,i=1,...,nm,im,imm,imandlnwage=xβ+ε,j=1,...,n.f,jf,jff,jf\nGreene-50240bookJune3,20029:5754CHAPTER4✦Finite-SamplePropertiesoftheLeastSquaresEstimatorTheregressorvectorsincludesociodemographicvariables,suchasage,andhumancap-italvariables,suchaseducationandexperience.Weareinterestedincomparingthesetworegressions,particularlytoseeiftheysuggestwagediscrimination.Oaxacasug-gestedacomparisonoftheregressionfunctions.Foranytwovectorsofcharacteristics,E[lnwage]−E[lnwage]=xβ−xβm,if,jm,imf,jf=xβ−xβ+xβ−xβm,imm,ifm,iff,jf=x(β−β)+(x−x)β.m,imfm,if,jfThesecondterminthisdecompositionisidentifiedwithdifferencesinhumancapitalthatwouldexplainwagedifferencesnaturally,assumingthatlabormarketsrespondtothesedifferencesinwaysthatwewouldexpect.Thefirsttermshowsthedifferentialinlogwagesthatisattributabletodifferencesunexplainablebyhumancapital;holdingthesefactorsconstantatxmmakesthefirsttermattributabletootherfactors.Oaxacasuggestedthatthisdecompositionbecomputedatthemeansofthetworegressorvec-tors,x¯mandx¯f,andtheleastsquarescoefficientvectors,bmandbf.Iftheregressionscontainconstantterms,thenthisprocesswillbeequivalenttoanalyzinglnym−lnyf.Weareinterestedinformingaconfidenceintervalforthefirstterm,whichwillrequiretwoapplicationsofourresult.Wewilltreatthetwovectorsofsamplemeansasknownvectors.Assumingthatwehavetwoindependentsetsofobservations,ourtwoestimators,bmandbf,areindependentwithmeansβmandβfandcovariancematricesσ2(XX)−1andσ2(XX)−1.Thecovariancematrixofthedifferenceisthesumofmmmfffthesetwomatrices.Weareformingaconfidenceintervalforx¯dwhered=b−b.mmfTheestimatedcovariancematrixisEst.Var[d]=s2(XX)−1+s2(XX)−1.mmmfffNow,wecanapplytheresultabove.Wecanalsoformaconfidenceintervalforthesecondterm;justdefinew=x¯−x¯andapplytheearlierresulttowb.mff4.7.4TESTINGTHESIGNIFICANCEOFTHEREGRESSIONAquestionthatisusuallyofinterestiswhethertheregressionequationasawholeissignificant.Thistestisajointtestofthehypothesesthatallthecoefficientsexcepttheconstanttermarezero.Ifalltheslopesarezero,thenthemultiplecorrelationcoefficientiszeroaswell,sowecanbaseatestofthishypothesisonthevalueofR2.ThecentralresultneededtocarryoutthetestisthedistributionofthestatisticR2/(K−1)F[K−1,n−K]=.(4-15)(1−R2)/(n−K)Ifthehypothesisthatβ2=0(thepartofβnotincludingtheconstant)istrueandthedisturbancesarenormallydistributed,thenthisstatistichasanFdistributionwithK−1andn−Kdegreesoffreedom.3LargevaluesofFgiveevidenceagainstthevalidityofthehypothesis.NotethatalargeFisinducedbyalargevalueofR2.ThelogicofthetestisthattheFstatisticisameasureofthelossoffit(namely,allofR2)thatresultswhenweimposetherestrictionthatalltheslopesarezero.IfFislarge,thenthehypothesisisrejected.3TheproofofthedistributionalresultappearsinSection6.3.1.TheFstatisticgivenaboveisthespecialcaseinwhichR=[0|IK−1].\nGreene-50240bookJune3,20029:57CHAPTER4✦Finite-SamplePropertiesoftheLeastSquaresEstimator55Example4.5FTestfortheEarningsEquationTheFratiofortestingthehypothesisthatthefourslopesintheearningsequationareallzerois0.040995/4F[4,423]==4.521,(1−0.040995)/(428−5)whichisfarlargerthanthe95percentcriticalvalueof2.37.Weconcludethatthedataareinconsistentwiththehypothesisthatalltheslopesintheearningsequationarezero.Wemighthaveexpectedtheprecedingresult,giventhesubstantialtratiospresentedearlier.Butthiscaseneednotalwaysbetrue.Examplescanbeconstructedinwhichtheindividualcoefficientsarestatisticallysignificant,whilejointlytheyarenot.Thiscasecanberegardedaspathological,buttheoppositeone,inwhichnoneofthecoefficientsissignifi-cantlydifferentfromzerowhileR2ishighlysignificant,isrelativelycommon.Theproblemisthattheinteractionamongthevariablesmayservetoobscuretheirindividualcontributiontothefitoftheregression,whereastheirjointeffectmaystillbesignificant.WewillreturntothispointinSection4.9.1inourdiscussionofmulticollinearity.4.7.5MARGINALDISTRIBUTIONSOFTHETESTSTATISTICSWenowconsidertherelationbetweenthesampleteststatisticsandthedatainX.First,considertheconventionaltstatisticin(4-14)fortestingH:β=β0,0kkb−β0kkt|X=−11/2.s2(XX)kkConditionalonX,ifβ=β0(i.e.,underH),thent|Xhasatdistributionwith(n−K)kk0degreesoffreedom.Whatinterestsus,however,isthemarginal,thatis,theuncon-ditional,distributionoft.Aswesaw,bisonlynormallydistributedconditionallyonX;themarginaldistributionmaynotbenormalbecauseitdependsonX(throughtheconditionalvariance).Similarly,becauseofthepresenceofX,thedenominatorofthetstatisticisnotthesquarerootofachi-squaredvariabledividedbyitsde-greesoffreedom,again,exceptconditionalonthisX.But,becausethedistributionsof(b−β)/[σ2(XX)−1]1/2|Xand[(n−K)s2/σ2]|XarestillindependentN[0,1]kkkkandχ2[n−K],respectively,whichdonotinvolveX,wehavethesurprisingresultthat,regardlessofthedistributionofX,orevenofwhetherXisstochasticornonstochastic,themarginaldistributionsoftisstillt,eventhoughthemarginaldistributionofbkmaybenonnormal.Thisintriguingresultfollowsbecausef(t|X)isnotafunctionofX.ThesamereasoningcanbeusedtodeducethattheusualFratiousedfortestinglinearrestrictionsisvalidwhetherXisstochasticornot.Thisresultisverypowerful.Theimplicationisthatifthedisturbancesarenormallydistributed,thenwemaycarryouttestsandconstructconfidenceintervalsfortheparameterswithoutmakinganychangesinourprocedures,regardlessofwhethertheregressorsarestochastic,nonstochastic,orsomemixofthetwo.4.8FINITE-SAMPLEPROPERTIESOFLEASTSQUARESAsummaryoftheresultswehaveobtainedfortheleastsquaresestimatorappearsinTable4.3.Forconstructingconfidenceintervalsandtestinghypotheses,wederivedsomeadditionalresultsthatdependedexplicitlyonthenormalityassumption.Only\nGreene-50240bookJune3,20029:5756CHAPTER4✦Finite-SamplePropertiesoftheLeastSquaresEstimatorTABLE4.3FiniteSamplePropertiesofLeastSquaresGeneralresults:FS1.E[b|X]=E[b]=β.Leastsquaresisunbiased.FS2.Var[b|X]=σ2(XX)−1;Var[b]=σ2E[(XX)−1].FS3.Gauss−Markovtheorem:TheMVLUEofwβiswb.FS4.E[s2|X]=E[s2]=σ2.FS5.Cov[b,e|X]=E[(b−β)e|X]=E[(XX)−1XεεM|X]=0asX(σ2I)M=0.ResultsthatfollowfromAssumptionA6,normallydistributeddisturbances:FS6.bandearestatisticallyindependent.Itfollowsthatbands2areuncorrelatedandstatisticallyindependent.FS7.Theexactdistributionofb|X,isN[β,σ2(XX)−1].FS8.(n−K)s2/σ2∼χ2[n−K].s2hasmeanσ2andvariance2σ4/(n−K).TestStatisticsbasedonresultsFS6throughFS8:FS9.t[n−K]=(b−β)/[s2(XX)−1]1/2∼t[n−K]independentlyofX.kkkkFS10.Theteststatisticfortestingthenullhypothesisthatallslopesinthemodelarezero,F[K−1,n−K]=[R2/(K−1)]/[(1−R2)/(n−K)]hasanFdistributionwithK−1andn−Kdegreesoffreedomwhenthenullhypothesisistrue.FS7dependsonwhetherXisstochasticornot.Ifso,thenthemarginaldistributionofbdependsonthatofX.NotethedistinctionbetweenthepropertiesofbestablishedusingA1throughA4andtheadditionalinferenceresultsobtainedwiththefurtherassumptionofnormalityofthedisturbances.TheprimaryresultinthefirstsetistheGauss–Markovtheorem,whichholdsregardlessofthedistributionofthedisturbances.TheimportantadditionalresultsbroughtbythenormalityassumptionareFS9andFS10.4.9DATAPROBLEMSInthissection,weconsiderthreepracticalproblemsthatariseinthesettingofregressionanalysis,multicollinearity,missingobservationsandoutliers.4.9.1MULTICOLLINEARITYTheGauss–Markovtheoremstatesthatamongalllinearunbiasedestimators,theleastsquaresestimatorhasthesmallestvariance.Althoughthisresultisuseful,itdoesnotassureusthattheleastsquaresestimatorhasasmallvarianceinanyabsolutesense.Consider,forexample,amodelthatcontainstwoexplanatoryvariablesandaconstant.Foreitherslopecoefficient,σ2σ2Var[bk]=2n2=2,k=1,2.(4-16)1−r12i=1(xik−x¯k)1−r12SkkIfthetwovariablesareperfectlycorrelated,thenthevarianceisinfinite.Thecaseofanexactlinearrelationshipamongtheregressorsisaseriousfailureoftheassumptionsofthemodel,notofthedata.Themorecommoncaseisoneinwhichthevariablesarehighly,butnotperfectly,correlated.Inthisinstance,theregressionmodelretainsallitsassumedproperties,althoughpotentiallyseverestatisticalproblemsarise.The\nGreene-50240bookJune3,20029:57CHAPTER4✦Finite-SamplePropertiesoftheLeastSquaresEstimator57problemfacedbyappliedresearcherswhenregressorsarehighly,althoughnotperfectly,correlatedincludethefollowingsymptoms:•Smallchangesinthedataproducewideswingsintheparameterestimates.•CoefficientsmayhaveveryhighstandarderrorsandlowsignificancelevelseventhoughtheyarejointlysignificantandtheR2fortheregressionisquitehigh.•Coefficientsmayhavethe“wrong”signorimplausiblemagnitudes.Forconvenience,definethedatamatrix,X,tocontainaconstantandK−1othervariablesmeasuredindeviationsfromtheirmeans.Letxkdenotethekthvariable,andletX(k)denotealltheothervariables(includingtheconstantterm).Then,intheinversematrix,(XX)−1,thekthdiagonalelementis−1−1−1xkM(k)xk=xkxk−xkX(k)X(k)X(k)X(k)xk−1−1xXXXXxk(k)(k)(k)(k)k=xkxk1−(4-17)xkxk1=,1−R2Sk.kkwhereR2istheR2intheregressionofxonalltheothervariables.Inthemultiplek.kregressionmodel,thevarianceofthekthleastsquarescoefficientestimatorisσ2timesthisratio.Itthenfollowsthatthemorehighlycorrelatedavariableiswiththeothervariablesinthemodel(collectively),thegreateritsvariancewillbe.Inthemostextremecase,inwhichxkcanbewrittenasalinearcombinationoftheothervariablessothatR2=1,thevariancebecomesinfinite.Theresultk.σ2Var[bk]=2n2,(4-18)1−Rk.i=1(xik−x¯k)showsthethreeingredientsoftheprecisionofthekthleastsquarescoefficientestimator:•Otherthingsbeingequal,thegreaterthecorrelationofxkwiththeothervariables,thehigherthevariancewillbe,duetomulticollinearity.•Otherthingsbeingequal,thegreaterthevariationinxk,thelowerthevariancewillbe.ThisresultisshowninFigure4.2.•Otherthingsbeingequal,thebettertheoverallfitoftheregression,thelowerthevariancewillbe.Thisresultwouldfollowfromalowervalueofσ2.Wehaveyettodevelopthisimplication,butitcanbesuggestedbyFigure4.2byimaginingtheidenticalfigureintherightpanelbutwithallthepointsmovedclosertotheregressionline.Sincenonexperimentaldatawillneverbeorthogonal(R2=0),tosomeextentk.multicollinearitywillalwaysbepresent.Whenismulticollinearityaproblem?Thatis,whenarethevariancesofourestimatessoadverselyaffectedbythisintercorrelationthatweshouldbe“concerned?”Somecomputerpackagesreportavarianceinflationfactor(VIF),1/(1−R2),foreachcoefficientinaregressionasadiagnosticstatistic.Ascank.beseen,theVIFforavariableshowstheincreaseinVar[bk]thatcanbeattributabletothefactthatthisvariableisnotorthogonaltotheothervariablesinthemodel.AnothermeasurethatisspecificallydirectedatXistheconditionnumberofXX,whichisthe\nGreene-50240bookJune3,20029:5758CHAPTER4✦Finite-SamplePropertiesoftheLeastSquaresEstimatorTABLE4.4LongleyResults:DependentVariableisEmployment1947–1961VarianceInflation1947–1962Constant1,459,4151,169,087Year−721.756251.839−576.464GNPdeflator−181.12375.6716−19.7681GNP0.0910678132.4670.0643940ArmedForces−0.07493701.55319−0.0101453squarerootratioofthelargestcharacteristicrootofXX(afterscalingeachcolumnsothatithasunitlength)tothesmallest.Valuesinexcessof20aresuggestedasindicativeofaproblem[Belsley,Kuh,andWelsch(1980)].(TheconditionnumberfortheLongleydataofExample4.6isover15,000!)Example4.6MulticollinearityintheLongleyDataThedatainTableF4.2wereassembledbyJ.Longley(1967)forthepurposeofassessingtheaccuracyofleastsquarescomputationsbycomputerprograms.(Thesedataarestillwidelyusedforthatpurpose.)TheLongleydataarenotoriousforseveremulticollinearity.Note,forexample,thelastyearofthedataset.Thelastobservationdoesnotappeartobeunusual.But,theresultsinTable4.4showthedramaticeffectofdroppingthissingleobservationfromaregressionofemploymentonaconstantandtheothervariables.Thelastcoefficientrisesby600percent,andthethirdrisesby800percent.Severalstrategieshavebeenproposedforfindingandcopingwithmulticollinear-ity.4Undertheviewthatamulticollinearity“problem”arisesbecauseofashortageofinformation,onesuggestionistoobtainmoredata.Onemightarguethatifanalystshadsuchadditionalinformationavailableattheoutset,theyoughttohaveuseditbeforereachingthisjuncture.Moreinformationneednotmeanmoreobservations,however.Theobviouspracticalremedy(andsurelythemostfrequentlyused)istodropvariablessuspectedofcausingtheproblemfromtheregression—thatis,toimposeontheregres-sionanassumption,possiblyerroneous,thatthe“problem”variabledoesnotappearinthemodel.Indoingso,oneencounterstheproblemsofspecificationthatwewilldiscussinSection8.2.Ifthevariablethatisdroppedactuallybelongsinthemodel(inthesensethatitscoefficient,βk,isnotzero),thenestimatesoftheremainingcoefficientswillbebiased,possiblyseverelyso.Ontheotherhand,overfitting—thatis,tryingtoestimateamodelthatistoolarge—isacommonerror,anddroppingvariablesfromanexcessivelyspecifiedmodelmighthavesomevirtue.Severalotherpracticalapproacheshavealsobeensuggested.Theridgeregressionestimatorisb=[XX+rD]−1Xy,whereDisardiagonalmatrix.Thisbiasedestimatorhasacovariancematrixunambiguouslysmallerthanthatofb.Thetradeoffofsomebiasforsmallervariancemaybeworthmaking(seeJudgeetal.,1985),but,nonetheless,economistsaregenerallyaversetobiasedestimators,sothisapproachhasseenlittlepracticaluse.Anotherapproachsometimesused[see,e.g.,Gurmu,Rilstone,andStern(1999)]istouseasmallnumber,sayL,ofprincipalcomponentsconstructedfromtheKoriginalvariables.[SeeJohnsonandWichern(1999).]Theproblemhereisthatiftheoriginalmodelintheformy=Xβ+εwerecorrect,thenitisunclearwhatoneisestimatingwhenoneregressesyonsome4SeeHillandAdkins(2001)foradescriptionofthestandardsetoftoolsfordiagnosingcollinearity.\nGreene-50240bookJune3,20029:57CHAPTER4✦Finite-SamplePropertiesoftheLeastSquaresEstimator59smallsetoflinearcombinationsofthecolumnsofX.Algebraically,itissimple;atleastfortheprincipalcomponentscase,inwhichweregressyonZ=XCLtoobtaind,itfollowsthatE[d]=δ=CCβ.Inaneconomiccontext,ifβhasaninterpretation,thenLLitisunlikelythatδwill.(Howdoweinterpretthepriceelasticityplusminustwicetheincomeelasticity?)Usingdiagnostictoolstodetectmulticollinearitycouldbeviewedasanattempttodistinguishabadmodelfrombaddata.But,infact,theproblemonlystemsfromaprioropinionwithwhichthedataseemtobeinconflict.Afindingthatsuggestsmulticollinearityisadverselyaffectingtheestimatesseemstosuggestthatbutforthiseffect,allthecoefficientswouldbestatisticallysignificantandoftherightsign.Ofcourse,thissituationneednotbethecase.Ifthedatasuggestthatavariableisunimportantinamodel,then,thetheorynotwithstanding,theresearcherultimatelyhastodecidehowstrongthecommitmentistothattheory.Suggested“remedies”formulticollinearitymightwellamounttoattemptstoforcethetheoryonthedata.4.9.2MISSINGOBSERVATIONSItisfairlycommonforadatasettohavegaps,foravarietyofreasons.Perhapsthemostcommonoccurrenceofthisproblemisinsurveydata,inwhichitoftenhappensthatrespondentssimplyfailtoanswerthequestions.Inatimeseries,thedatamaybemissingbecausetheydonotexistatthefrequencywewishtoobservethem;forexample,themodelmayspecifymonthlyrelationships,butsomevariablesareobservedonlyquarterly.Therearetwopossiblecasestoconsider,dependingonwhythedataaremissing.Oneisthatthedataaresimplyunavailable,forreasonsunknowntotheanalystandunrelatedtothecompletenessoftheotherobservationsinthesample.Ifthisisthecase,thenthecompleteobservationsinthesampleconstituteausabledataset,andtheonlyissueiswhatpossiblyhelpfulinformationcouldbesalvagedfromtheincompleteobser-vations.Griliches(1986)callsthistheignorablecaseinthat,forpurposesofestimation,ifwearenotconcernedwithefficiency,thenwemaysimplyignoretheproblem.Asecondcase,whichhasattractedagreatdealofattentionintheeconometricsliterature,isthatinwhichthegapsinthedatasetarenotbenignbutaresystematicallyrelatedtothephenomenonbeingmodeled.Thiscasehappensmostofteninsurveyswhenthedataare“self-selected”or“self-reported.”5Forexample,ifasurveyweredesignedtostudyexpenditurepatternsandifhigh-incomeindividualstendedtowithholdinfor-mationabouttheirincome,thenthegapsinthedatasetwouldrepresentmorethanjustmissinginformation.Inthiscase,thecompleteobservationswouldbequalitativelydifferent.WetreatthissecondcaseinChapter22,soweshalldeferourdiscussionuntillater.Ingeneral,notmuchisknownaboutthepropertiesofestimatorsbasedonusingpredictedvaluestofillmissingvaluesofy.Thoseresultswedohavearelargelyfromsimulationstudiesbasedonaparticulardatasetorpatternofmissingdata.TheresultsoftheseMonteCarlostudiesareusuallydifficulttogeneralize.Theoverallconclusion5ThevastsurveysofAmericans’opinionsaboutsexbyAnnLanders(1984,passim)andShereHite(1987)constitutetwocelebratedstudiesthatweresurelytaintedbyaheavydoseofself-selectionbias.Thelatterwaspilloriedinnumerouspublicationsforpurportingtorepresentthepopulationatlargeinsteadoftheopinionsofthosestronglyenoughinclinedtorespondtothesurvey.Thefirstwaspresentedwithmuchgreatermodesty.\nGreene-50240bookJune3,20029:5760CHAPTER4✦Finite-SamplePropertiesoftheLeastSquaresEstimatorseemstobethatinasingle-equationregressioncontext,fillinginmissingvaluesofyleadstobiasesintheestimatorwhicharedifficulttoquantify.Forthecaseofmissingdataintheregressors,ithelpstoconsiderthesimpleregres-sionandmultipleregressioncasesseparately.Inthefirstcase,Xhastwocolumnsthecolumnof1sfortheconstantandacolumnwithsomeblankswherethemissingdatawouldbeifwehadthem.Severalschemeshavebeensuggestedforfillingtheblanks.Thezero-ordermethodofreplacingeachmissingxwithx¯resultsinnochangesandisequivalenttodroppingtheincompletedata.(SeeExercise7inChapter3.)However,theR2willbelower.Analternative,modifiedzero-orderregressionistofillthesec-ondcolumnofXwithzerosandaddavariablethattakesthevalueoneformissingobservationsandzeroforcompleteones.6Weleaveitasanexercisetoshowthatthisisalgebraicallyidenticaltosimplyfillingthegapswithx¯Last,thereisthepossibilityofcomputingfittedvaluesforthemissingx’sbyaregressionofxonyinthecompletedata.Thesamplingpropertiesoftheresultingestimatorarelargelyunknown,butwhatevidencethereissuggeststhatthisisnotabeneficialwaytoproceed.74.9.3REGRESSIONDIAGNOSTICSANDINFLUENTIALDATAPOINTSEvenintheabsenceofmulticollinearityorotherdataproblems,itisworthwhiletoexamineone’sdatacloselyfortworeasons.First,theidentificationofoutliersinthedataisuseful,particularlyinrelativelysmallcrosssectionsinwhichtheidentityandperhapseventheultimatesourceofthedatapointmaybeknown.Second,itmaybepossibletoascertainwhich,ifany,particularobservationsareespeciallyinfluentialintheresultsobtained.Assuch,theidentificationofthesedatapointsmaycallforfurtherstudy.Itisworthemphasizing,though,thatthereisacertaindangerinsinglingoutparticularobservationsforscrutinyoreveneliminationfromthesampleonthebasisofstatisticalresultsthatarebasedonthosedata.Attheextreme,thisstepmayinvalidatetheusualinferenceprocedures.Ofparticularimportanceinthisanalysisistheprojectionmatrixorhatmatrix:P=X(XX)−1X.(4-19)Thismatrixappearedearlierasthematrixthatprojectsanyn×1vectorintothecolumnspaceofX.Foranyvectory,PyisthesetoffittedvaluesintheleastsquaresregressionofyonX.Theleastsquaresresidualsaree=My=Mε=(I−P)ε,sothecovariancematrixfortheleastsquaresresidualvectorisE[ee]=σ2M=σ2(I−P).Toidentifywhichresidualsaresignificantlylarge,wefirststandardizethembydividing6SeeMaddala(1977a,p.202).7AfifiandElashoff(1966,1967)andHaitovsky(1968).Griliches(1986)considersanumberofotherpossibilities.\nGreene-50240bookJune3,20029:57CHAPTER4✦Finite-SamplePropertiesoftheLeastSquaresEstimator61StandardizedResiduals3.01.8.6Residual.61.83.0194619481950195219541956195819601962YEARFIGURE4.3StandardizedResidualsfortheLongleyData.bytheappropriatestandarddeviations.Thus,wewoulduseeieieˆi==,(4-20)[s2(1−pii)]1/2(s2mii)1/2whereeistheithleastsquaresresidual,s2=ee/(n−K),pistheithdiagonalelementiiiofPandmiiistheithdiagonalelementofM.Itiseasytoshow(weleaveitasanexercise)thate/m=y−xb(i)whereb(i)istheleastsquaresslopevectorcomputedwith-iiiiioutthisobservation,sothestandardizationisanaturalwaytoinvestigatewhethertheparticularobservationdifferssubstantiallyfromwhatshouldbeexpectedgiventhemodelspecification.Dividingbys2,orbetter,s(i)2scalestheobservationssothatthevalue2.0[suggestedbyBelsley,etal.(1980)]providesanappropriatebenchmark.Figure4.3illustratesfortheLongleydataofthepreviousexample.Apparently,1956wasanunusualyearaccordingtothis“model.”(Whattodowithoutliersisaquestion.Discardinganobservationinthemiddleofatimeseriesisprobablyabadidea,thoughwemayhopetolearnsomethingaboutthedatainthisway.Foracrosssection,onemaybeabletosingleoutobservationsthatdonotconformtothemodelwiththistechnique.)4.10SUMMARYANDCONCLUSIONSThischapterhasexaminedasetofpropertiesoftheleastsquaresestimatorthatwillapplyinallsamples,includingunbiasednessandefficiencyamongunbiasedestimators.Theassumptionofnormalityofthedisturbancesproducesthedistributionsofsomeusefulteststatisticswhichareusefulforastatisticalassessmentofthevalidityoftheregressionmodel.ThefinitesampleresultsobtainedinthischapterarelistedinTable4.3.\nGreene-50240bookJune3,20029:5762CHAPTER4✦Finite-SamplePropertiesoftheLeastSquaresEstimatorWealsoconsideredsomepracticalproblemsthatarisewhendataarelessthanperfectfortheestimationandanalysisoftheregressionmodel,includingmulticollinearityandmissingobservations.Theformalassumptionsoftheclassicalmodelarepivotalintheresultsofthischapter.Allofthemarelikelytobeviolatedinmoregeneralsettingsthantheoneconsideredhere.Forexample,inmostcasesexaminedlaterinthebook,theestimatorhasapossiblebias,butthatbiasdiminisheswithincreasingsamplesizes.Also,wearegoingtobeinterestedinhypothesistestsofthetypeconsideredhere,butatthesametime,theassumptionofnormalityisnarrow,soitwillbenecessarytoextendthemodeltoallownonnormaldisturbances.Theseandother‘largesample’extensionsofthelinearmodelwillbeconsideredinChapter5.KeyTermsandConcepts•Assumptions•Minimumvariancelinear•Semiparametric•Conditionnumberunbiasedestimator•StandardError•Confidenceinterval•Missingobservations•Standarderrorofthe•Estimator•Multicollinearityregression•Gauss-MarkovTheorem•Oaxaca’sdecomposition•Statisticalproperties•Hatmatrix•Optimallinearpredictor•Stochasticregressors•Ignorablecase•Orthogonalrandom•tratio•Linearestimatorvariables•Linearunbiasedestimator•Principalcomponents•Meansquarederror•Projectionmatrix•Minimummeansquared•Samplingdistributionerror•SamplingvarianceExercises1.Supposethatyouhavetwoindependentunbiasedestimatorsofthesameparameterθ,sayθˆ1andθˆ2,withdifferentvariancesv1andv2.Whatlinearcombinationθˆ=c1θˆ1+c2θˆ2istheminimumvarianceunbiasedestimatorofθ?2.Considerthesimpleregressiony=βx+εwhereE[ε|x]=0andE[ε2|x]=σ2iiia.Whatistheminimummeansquarederrorlinearestimatorofβ?[Hint:Lettheestimatorbe[βˆ=cy].ChoosectominimizeVar[βˆ]+[E(βˆ−β)]2.Theanswerisafunctionoftheunknownparameters.]b.Fortheestimatorinparta,showthatratioofthemeansquarederrorofβˆtothatoftheordinaryleastsquaresestimatorbisMSE[βˆ]τ2β2=,whereτ2=.MSE[b](1+τ2)[σ2/xx]Notethatτisthesquareofthepopulationanalogtothe“tratio”fortestingthehypothesisthatβ=0,whichisgivenin(4-14).Howdoyouinterpretthebehaviorofthisratioasτ→∞?3.Supposethattheclassicalregressionmodelappliesbutthatthetruevalueoftheconstantiszero.Comparethevarianceoftheleastsquaresslopeestimatorcom-putedwithoutaconstanttermwiththatoftheestimatorcomputedwithanunnec-essaryconstantterm.\nGreene-50240bookJune3,20029:57CHAPTER4✦Finite-SamplePropertiesoftheLeastSquaresEstimator634.Supposethattheregressionmodelisyi=α+βxi+εi,wherethedisturbancesεihavef(εi)=(1/λ)exp(−λεi),εi≥0.Thismodelisratherpeculiarinthatallthedisturbancesareassumedtobepositive.NotethatthedisturbanceshaveE[εi|xi]=λandVar[ε|x]=λ2.Showthattheleastsquaresslopeisunbiasedbutthattheiiinterceptisbiased.5.Provethattheleastsquaresinterceptestimatorintheclassicalregressionmodelistheminimumvariancelinearunbiasedestimator.6.Asaprofitmaximizingmonopolist,youfacethedemandcurveQ=α+βP+ε.Inthepast,youhavesetthefollowingpricesandsoldtheaccompanyingquantities:Q337610151613915915121821P181617121515413116810777Supposethatyourmarginalcostis10.Basedontheleastsquaresregression,com-putea95percentconfidenceintervalfortheexpectedvalueoftheprofitmaximizingoutput.7.Thefollowingsamplemomentsforx=[1,x1,x2,x3]werecomputedfrom100ob-servationsproducedusingarandomnumbergenerator:10012396109460123252125189810XX=,Xy=,yy=3924.96125167146615109189146168712Thetruemodelunderlyingthesedataisy=x1+x2+x3+ε.a.Computethesimplecorrelationsamongtheregressors.b.Computetheordinaryleastsquarescoefficientsintheregressionofyonacon-stantx1,x2,andx3.c.Computetheordinaryleastsquarescoefficientsintheregressionofyonacon-stantx1andx2,onaconstantx1andx3,andonaconstantx2andx3.d.Computethevarianceinflationfactorassociatedwitheachvariable.e.Theregressorsareobviouslycollinear.Whichistheproblemvariable?8.ConsiderthemultipleregressionofyonKvariablesXandanadditionalvariablez.ProvethatundertheassumptionsA1throughA6oftheclassicalregressionmodel,thetruevarianceoftheleastsquaresestimatoroftheslopesonXislargerwhenzisincludedintheregressionthanwhenitisnot.Doesthesameholdforthesampleestimateofthiscovariancematrix?Whyorwhynot?AssumethatXandzarenonstochasticandthatthecoefficientonzisnonzero.9.Fortheclassicalnormalregressionmodely=Xβ+εwithnoconstanttermandKregressors,assumingthatthetruevalueofβiszero,whatistheexactexpectedvalueofF[K,n−K]=(R2/K)/[(1−R2)/(n−K)]?2K10.ProvethatE[bb]=ββ+σk=1(1/λk)wherebistheordinaryleastsquaresestimatorandλisacharacteristicrootofXX.k11.DataonU.S.gasolineconsumptionfortheyears1960to1995aregiveninTableF2.2.a.Computethemultipleregressionofpercapitaconsumptionofgasoline,G/pop,onalltheotherexplanatoryvariables,includingthetimetrend,andreportallresults.Dothesignsoftheestimatesagreewithyourexpectations?\nGreene-50240bookJune3,20029:5764CHAPTER4✦Finite-SamplePropertiesoftheLeastSquaresEstimatorb.Testthehypothesisthatatleastinregardtodemandforgasoline,consumersdonotdifferentiatebetweenchangesinthepricesofnewandusedcars.c.Estimatetheownpriceelasticityofdemand,theincomeelasticity,andthecross-priceelasticitywithrespecttochangesinthepriceofpublictransportation.d.Reestimatetheregressioninlogarithmssothatthecoefficientsaredirectesti-matesoftheelasticities.(Donotusethelogofthetimetrend.)Howdoyourestimatescomparewiththeresultsinthepreviousquestion?Whichspecificationdoyouprefer?e.Noticethatthepriceindicesfortheautomobilemarketarenormalizedto1967,whereastheaggregatepriceindicesareanchoredat1982.Doesthisdiscrepancyaffecttheresults?How?Ifyouweretorenormalizetheindicessothattheywereall1.000in1982,thenhowwouldyourresultschange?\nGreene-50240bookJune3,20029:595LARGE-SAMPLEPROPERTIESOFTHELEASTSQUARESANDINSTRUMENTALVARIABLESESTIMATORSQ5.1INTRODUCTIONThediscussionthusfarhasconcernedfinite-samplepropertiesoftheleastsquaresestimator.Wederiveditsexactmeanandvarianceandtheprecisedistributionoftheestimatorandseveralteststatisticsundertheassumptionsofnormallydistributeddis-turbancesandindependentobservations.Theseresultsareindependentofthesamplesize.Buttheclassicalregressionmodelwithnormallydistributeddisturbancesandinde-pendentobservationsisaspecialcasethatdoesnotincludemanyofthemostcommonapplications,suchaspaneldataandmosttimeseriesmodels.Thischapterwillgeneralizetheclassicalregressionmodelbyrelaxingthesetwoimportantassumptions.1Thelinearmodelisoneofrelativelyfewsettingsinwhichanydefinitestatementscanbemadeabouttheexactfinitesamplepropertiesofanyestimator.Inmostcases,theonlyknownpropertiesoftheestimatorsarethosethatapplytolargesamples.Wecanonlyapproximatefinite-samplebehaviorbyusingwhatweknowaboutlarge-sampleproperties.Thischapterwillexaminetheasymptoticpropertiesoftheparameterestimatorsintheclassicalregressionmodel.Inadditiontotheleastsquaresestimator,thischapterwillalsointroduceanalternativetechnique,themethodofinstrumentalvariables.Inthiscase,onlythelargesamplepropertiesareknown.5.2ASYMPTOTICPROPERTIESOFTHELEASTSQUARESESTIMATORUsingonlyassumptionsA1throughA4oftheclassicalmodel(aslistedinTable4.1),wehaveestablishedthattheleastsquaresestimatorsoftheunknownparameters,βandσ2,havetheexact,finite-samplepropertieslistedinTable4.3.Forthisbasicmodel,itisstraightforwardtoderivethelarge-samplepropertiesoftheleastsquaresestimator.Thenormalityassumption,A6,becomesinessentialatthispoint,andwillbediscardedsaveforbriefdiscussionsofmaximumlikelihoodestimationinChapters10and17.ThissectionwillconsidervariousformsofAssumptionA5,thedatageneratingmechanism.1Mostofthisdiscussionwilluseourearlierresultsonasymptoticdistributions.ItmaybehelpfultoreviewAppendixDbeforeproceeding.65\nGreene-50240bookJune3,20029:5966CHAPTER5✦Large-SampleProperties5.2.1CONSISTENCYOFTHELEASTSQUARESESTIMATOROFβTobegin,weleavethedatageneratingmechanismforXunspecified—Xmaybeanymixtureofconstantsandrandomvariablesgeneratedindependentlyoftheprocessthatgeneratesε.Wedomaketwocrucialassumptions.ThefirstisamodificationofAssumptionA5inTable4.1;A5a.(xi,εI)i=1,...,nisasequenceofindependentobservations.Thesecondconcernsthebehaviorofthedatainlargesamples;XXplim=Q,apositivedefinitematrix.(5-1)n→∞n[Wewillreturnto(5-1)shortly.]Theleastsquaresestimatormaybewritten−1XXXεb=β+.(5-2)nnIfQ−1exists,thenXεplimb=β+Q−1plimnbecausetheinverseisacontinuousfunctionoftheoriginalmatrix.(WehaveinvokedTheoremD.14.)Werequiretheprobabilitylimitofthelastterm.Let11n1nXε=xε=w=w¯.(5-3)iiinnni=1i=1Thenplimb=β+Q−1plimw¯.FromtheexogeneityAssumptionA3,wehaveE[wi]=Ex[E[wi|xi]]=Ex[xiE[εi|xi]]=0,sotheexactexpectationisE[w¯]=0.Foranyelementinxithatisnonstochastic,thezeroexpectationsfollowfromthemarginaldistributionofεi.Wenowconsiderthevariance.By(B-70),Var[w¯]=E[Var[w¯|X]]+Var[E[w¯|X]].ThesecondtermiszerobecauseE[ε|x]=0.Toobtainthefirst,weuseE[εε|X]=σ2I,soii11σ2XXVar[w¯|X]=E[w¯w¯|X]=XE[εε|X]X=.nnnnTherefore,σ2XXVar[w¯]=E.nnThevariancewillcollapsetozeroiftheexpectationinparenthesesis(orconvergesto)aconstantmatrix,sothattheleadingscalarwilldominatetheproductasnincreases.Assumption(5-1)shouldbesufficient.(Theoretically,theexpectationcoulddivergewhiletheprobabilitylimitdoesnot,butthiscasewouldnotberelevantforpracticalpurposes.)ItthenfollowsthatlimVar[w¯]=0·Q=0.n→∞\nGreene-50240bookJune3,20029:59CHAPTER5✦Large-SampleProperties67Sincethemeanofw¯isidenticallyzeroanditsvarianceconvergestozero,wconverges¯inmeansquaretozero,soplimw¯=0.Therefore,Xεplim=0,(5-4)nsoplimb=β+Q−1·0=β.(5-5)ThisresultestablishesthatunderAssumptionsA1–A4andtheadditionalassumption(5-1),bisaconsistentestimatorofβintheclassicalregressionmodel.Time-seriessettingsthatinvolvetimetrends,polynomialtimeseries,andtrendingvariablesoftenposecasesinwhichtheprecedingassumptionsaretoorestrictive.AsomewhatweakersetofassumptionsaboutXthatisbroadenoughtoincludemostoftheseistheGrenanderconditionslistedinTable5.1.2Theconditionsensurethatthedatamatrixis“wellbehaved”inlargesamples.Theassumptionsareveryweakandislikelytobesatisfiedbyalmostanydatasetencounteredinpractice.35.2.2ASYMPTOTICNORMALITYOFTHELEASTSQUARESESTIMATORToderivetheasymptoticdistributionoftheleastsquaresestimator,weshallusetheresultsofSectionD.3.Wewillmakeuseofsomebasiccentrallimittheorems,soinadditiontoAssumptionA3(uncorrelatedness),wewillassumethattheobservationsareindependent.Itfollowsfrom(5-2)that−1√XX1n(b−β)=√Xε.(5-6)nnSincetheinversematrixisacontinuousfunctionoftheoriginalmatrix,plim(XX/n)−1=Q−1.Therefore,ifthelimitingdistributionoftherandomvectorin(5-6)exists,thenthatlimitingdistributionisthesameasthatof−1XX11plim√Xε=Q−1√Xε.(5-7)nnnThus,wemustestablishthelimitingdistributionof1√√Xε=nw¯−E[w¯],(5-8)nwhereE[w¯]=0.[See(5-3).]WecanusethemultivariateLindberg–Fellerversionof√thecentrallimittheorem(D.19.A)toobtainthelimitingdistributionofnw¯.4Usingthatformulation,w¯istheaverageofnindependentrandomvectorswi=xiεi,withmeans0andvariancesVar[xε]=σ2E[xx]=σ2Q.(5-9)iiiii2Judgeetal.(1985,p.162).3White(2001)continuesthislineofanalysis.4NotethattheLindberg–LevyvariantdoesnotapplybecauseVar[wi]isnotnecessarilyconstant.\nGreene-50240bookJune3,20029:5968CHAPTER5✦Large-SamplePropertiesTABLE5.1GrenanderConditionsforWellBehavedDataG1.ForeachcolumnofX,x,ifd2=xx,thenlimd2=+∞.Hence,xdoesnotknkkkn→∞nkkdegeneratetoasequenceofzeros.Sumsofsquareswillcontinuetogrowasthesamplesizeincreases.Novariablewilldegeneratetoasequenceofzeros.G2.Limx2/d2=0foralli=1,...,n.Thisconditionimpliesthatnosingleobservationn→∞iknkwilleverdominatexx,andasn→∞,individualobservationswillbecomelessimportant.kkG3.LetRnbethesamplecorrelationmatrixofthecolumnsofX,excludingtheconstanttermifthereisone.Thenlimn→∞Rn=C,apositivedefinitematrix.Thisconditionimpliesthatthefullrankconditionwillalwaysbemet.WehavealreadyassumedthatXhasfullrankinafinitesample,sothisassumptionensuresthattheconditionwillneverbeviolated.√Thevarianceofnw¯is1σ2Q¯=σ2[Q+Q+···+Q].(5-10)n12nnAslongasthesumisnotdominatedbyanyparticulartermandtheregressorsarewellbehaved,whichinthiscasemeansthat(5-1)holds,limσ2Q¯=σ2Q.(5-11)nn→∞√Therefore,wemayapplytheLindberg–Fellercentrallimittheoremtothevectornw¯,√aswedidinSectionD.3fortheunivariatecasenx¯.Wenowhavetheelementsweneedforaformalresult.If[xiεi],i=1,...,nareindependentvectorsdistributedwithmean0andvarianceσ2Q<∞,andif(5-1)holds,theni1d2√Xε−→N[0,σQ].(5-12)nItthenfollowsthat−11d−1−12−1Q√Xε−→N[Q0,Q(σQ)Q].(5-13)nCombiningterms,√d2−1n(b−β)−→N[0,σQ].(5-14)UsingthetechniqueofSectionD.3,weobtaintheasymptoticdistributionofb:THEOREM5.1AsymptoticDistributionofbwithIndependentObservationsIf{ε}areindependentlydistributedwithmeanzeroandfinitevarianceσ2andxiikissuchthattheGrenanderconditionsaremet,thenσ2a−1b∼Nβ,Q.(5-15)nInpractice,itisnecessarytoestimate(1/n)Q−1with(XX)−1andσ2withee/(n−K).\nGreene-50240bookJune3,20029:59CHAPTER5✦Large-SampleProperties69Ifεisnormallydistributed,thenResultFS7in(Table4.3,Section4.8)holdsineverysample,soitholdsasymptoticallyaswell.Theimportantimplicationofthisderivationisthatiftheregressorsarewellbehavedandobservationsareindependent,thentheasymptoticnormalityoftheleastsquaresestimatordoesnotdependonnormalityofthedisturbances;itisaconsequenceofthecentrallimittheorem.Wewillconsiderothermoregeneralcasesinthesectionstofollow.5.2.3CONSISTENCYOFs2ANDTHEESTIMATOROFAsy.Var[b]Tocompletethederivationoftheasymptoticpropertiesofb,wewillrequireanestimatorofAsy.Var[b]=(σ2/n)Q−1.5With(5-1),itissufficienttorestrictattentiontos2,sothepurposehereistoassesstheconsistencyofs2asanestimatorofσ2.Expanding1s2=εMεn−Kproduces−11nεεεXXXXεs2=[εε−εX(XX)−1Xε]=−.n−Kn−knnnnTheleadingconstantclearlyconvergesto1.Wecanapply(5-1),(5-4)(twice),andtheproductruleforprobabilitylimits(TheoremD.14)toassertthatthesecondterminthebracketsconvergesto0.Thatleaves1nε2=ε2.ini=1Thisisanarrowcaseinwhichtherandomvariablesε2areindependentwiththesameifinitemeanσ2,sonotmuchisrequiredtogetthemeantoconvergealmostsurelytoσ2=E[ε2].BytheMarkovTheorem(D.8),whatisneededisforE[|ε2|1+δ]tobefinite,iisotheminimalassumptionthusfaristhatεihavefinitemomentsuptoslightlygreaterthan2.Indeed,ifwefurtherassumethateveryεihasthesamedistribution,thenbytheKhinchineTheorem(D.5)ortheCorollarytoD8,finitemoments(ofεi)upto2issufficient.MeansquareconvergencewouldrequireE[ε4]=φ<∞.Thenthetermsiεinthesumareindependent,withmeanσ2andvarianceφ−σ4.So,underfairlyweakεcondition,thefirstterminbracketsconvergesinprobabilitytoσ2,whichgivesourresult,plims2=σ2,and,bytheproductrule,plims2(XX/n)−1=σ2Q−1.TheappropriateestimatoroftheasymptoticcovariancematrixofbisEst.Asy.Var[b]=s2(XX)−1.5SeeMcCallum(1973)forsomeusefulcommentaryonderivingtheasymptoticcovariancematrixoftheleastsquaresestimator.\nGreene-50240bookJune3,20029:5970CHAPTER5✦Large-SampleProperties5.2.4ASYMPTOTICDISTRIBUTIONOFAFUNCTIONOFb:THEDELTAMETHODWecanextendTheoremD.22tofunctionsoftheleastsquaresestimator.Letf(b)beasetofJcontinuous,linearornonlinearandcontinuouslydifferentiablefunctionsoftheleastsquaresestimator,andlet∂f(b)C(b)=,∂bwhereCistheJ×Kmatrixwhosejthrowisthevectorofderivativesofthejthfunctionwithrespecttob.BytheSlutskyTheorem(D.12),plimf(b)=f(β)and∂f(β)plimC(b)==.∂βUsingourusuallinearTaylorseriesapproach,weexpandthissetoffunctionsintheapproximationf(b)=f(β)+×(b−β)+higher-orderterms.Thehigher-ordertermsbecomenegligibleinlargesamplesifplimb=β.Then,theasymptoticdistributionofthefunctionontheleft-handsideisthesameasthatontheright.Thus,themeanoftheasymptoticdistributionisplimf(b)=f(β),andtheasymptoticcovariancematrixis[Asy.Var(b−β)],whichgivesusthefollowingtheorem:THEOREM5.2AsymptoticDistributionofaFunctionofbIff(b)isasetofcontinuousandcontinuouslydifferentiablefunctionsofbsuchthat=∂f(β)/∂βandifTheorem5.1holds,thenσ2a−1f(b)∼Nf(β),Q.(5-16)nInpractice,theestimatoroftheasymptoticcovariancematrixwouldbeEst.Asy.Var[f(b)]=C[s2(XX)−1]C.Ifanyofthefunctionsarenonlinear,thenthepropertyofunbiasednessthatholdsforbmaynotcarryovertof(b).Nonetheless,itfollowsfrom(5-4)thatf(b)isaconsistentestimatoroff(β),andtheasymptoticcovariancematrixisreadilyavailable.5.2.5ASYMPTOTICEFFICIENCYWehavenotestablishedanylarge-samplecounterparttotheGauss-Markovtheorem.Thatis,itremainstoestablishwhetherthelarge-samplepropertiesoftheleastsquares\nGreene-50240bookJune3,20029:59CHAPTER5✦Large-SampleProperties71estimatorareoptimalbyanymeasure.TheGauss-MarkovTheoremestablishesfinitesampleconditionsunderwhichleastsquaresisoptimal.Therequirementsthattheestimatorbelinearandunbiasedlimitthetheorem’sgenerality,however.Oneofthemainpurposesoftheanalysisinthischapteristobroadentheclassofestimatorsintheclassicalmodeltothosewhichmightbebiased,butwhichareconsistent.Ultimately,weshallalsobeinterestedinnonlinearestimators.ThesecasesextendbeyondthereachoftheGaussMarkovTheorem.Tomakeanyprogressinthisdirection,wewillrequireanalternativeestimationcriterion.DEFINITION5.1AsymptoticEfficiencyAnestimatorisasymptoticallyefficientifitisconsistent,asymptoticallynormallydistributed,andhasanasymptoticcovariancematrixthatisnotlargerthantheasymptoticcovariancematrixofanyotherconsistent,asymptoticallynormallydistributedestimator.InChapter17,wewillshowthatifthedisturbancesarenormallydistributed,thentheleastsquaresestimatorisalsothemaximumlikelihoodestimator.Maximumlikeli-hoodestimatorsareasymptoticallyefficientamongconsistentandasymptoticallynor-mallydistributedestimators.Thisgivesusapartialresult,albeitasomewhatnarrowonesincetoclaimit,wemustassumenormallydistributeddisturbances.Ifsomeotherdistri-butionisspecifiedforεanditemergesthatbisnotthemaximumlikelihoodestimator,thenleastsquaresmaynotbeefficient.Example5.1TheGammaRegressionModelGreene(1980a)considersestimationinaregressionmodelwithanasymmetricallydistributeddisturbance,√√∗∗y=(α−σP)+xβ−(ε−σP)=α+xβ+ε,√whereεhasthegammadistributioninSectionB.4.5[see(B-39)]andσ=P/λisthestandarddeviationofthedisturbance.Inthismodel,thecovariancematrixoftheleastsquaresestimatoroftheslopecoefficients(notincludingtheconstantterm)is,Asy.Var[b|X]=σ2(XM0X)−1,whereasforthemaximumlikelihoodestimator(whichisnottheleastsquaresestimator),Asy.Var[βˆ]≈[1−(2/P)]σ2(XM0X)−1.6MLButfortheasymmetryparameter,thisresultwouldbethesameasfortheleastsquaresestimator.Weconcludethattheestimatorthataccountsfortheasymmetricdisturbancedistributionismoreefficientasymptotically.6TheMatrixM0producesdataintheformofdeviationsfromsamplemeans.(SeeSectionA.2.8.)InGreene’smodel,Pmustbegreaterthan2.\nGreene-50240bookJune3,20029:5972CHAPTER5✦Large-SampleProperties5.3MOREGENERALCASESTheasymptoticpropertiesoftheestimatorsintheclassicalregressionmodelwereestablishedinSection5.2underthefollowingassumptions:A1.Linearity:yi=xi1β1+xi2β2+···+xiKβK+εi.A2.Fullrank:Then×Ksampledatamatrix,Xhasfullcolumnrank.A3.Exogeneityoftheindependentvariables:E[εi|xj1,xj2,...,xjK]=0,i,j=1,...,n.A4.Homoscedasticityandnonautocorrelation.A5.Datageneratingmechanism-independentobservations.Thefollowingarethecrucialresultsneeded:Forconsistencyofb,weneed(5-1)and(5-4),plim(1/n)XX=plimQ¯=Q,apositivedefinitematrix,nplim(1/n)Xε=plimw¯=E[w¯]=0.nn(Forconsistencyofs2,weaddedafairlyweakassumptionaboutthemomentsofthedisturbances.)Toestablishasymptoticnormality,wewillrequireconsistencyand(5-12)whichis√d2nw¯n−→N[0,σQ].Withtheseinplace,thedesiredcharacteristicsarethenestablishedbythemethodsofSection5.2.Toanalyzeothercases,wecanmerelyfocusonthesethreeresults.Itisnotnecessarytoreestablishtheconsistencyorasymptoticnormalitythemselves,sincetheyfollowasaconsequence.5.3.1HETEROGENEITYINTHEDISTRIBUTIONSOFxiExceptionstotheassumptionsmadeabovearelikelytoariseintwosettings.Inapaneldataset,thesamplewillconsistofmultipleobservationsoneachofmanyobservationalunits.Forexample,astudymightconsistofasetofobservationsmadeatdifferentpointsintimeonalargenumberoffamilies.Inthiscase,thexswillsurelybecorrelatedacrossobservations,atleastwithinobservationalunits.Theymightevenbethesameforalltheobservationsonasinglefamily.Theyarealsolikelytobeamixtureofrandomvariables,suchasfamilyincome,andnonstochasticregressors,suchasafixed“familyeffect”representedbyadummyvariable.Thesecondcasewouldbeatime-seriesmodelinwhichlaggedvaluesofthedependentvariableappearontheright-handsideofthemodel.Thepaneldatasetcouldbetreatedasfollows.Assumeforthemomentthatthedataconsistofafixednumberofobservations,sayT,onasetofNfamilies,sothatthetotalnumberofrowsinXisn=NT.Thematrix1nQ¯n=Qini=1\nGreene-50240bookJune3,20029:59CHAPTER5✦Large-SampleProperties73inwhichnisalltheobservationsinthesample,couldbeviewedas111NQ¯n=Qij=Q¯i,NTNiobservationsi=1forfamilyiwhereQ¯i=averageQijforfamilyi.Wemightthenviewthesetofobservationsontheithunitasiftheywereasingleobservationandapplyourconvergenceargumentstothenumberoffamiliesincreasingwithoutbound.Thepointisthattheconditionsthatareneededtoestablishconvergencewillapplywithrespecttothenumberofobservationalunits.Thenumberofobservationstakenforeachobservationunitmightbefixedandcouldbequitesmall.5.3.2DEPENDENTOBSERVATIONSTheseconddifficultcaseariseswhentherearelaggeddependentvariablesamongthevariablesontheright-handsideor,moregenerally,intimeseriessettingsinwhichtheobservationsarenolongerindependentorevenuncorrelated.Supposethatthemodelmaybewritteny=zθ+γy+···+γy+ε.(5-17)tt1t−1pt−pt(Sincethismodelisatime-seriessetting,weusetinsteadofitoindextheobservations.)Wecontinuetoassumethatthedisturbancesareuncorrelatedacrossobservations.Sinceyt−1isdependentonyt−2andsoon,itisclearthatalthoughthedisturbancesareuncorrelatedacrossobservations,theregressorvectors,includingthelaggedys,surelyarenot.Also,althoughCov[xt,εs]=0ifs≥txt=[zt,yt−1,...,yt−p],Cov[xt,εs]=0ifs1.96,whichis0.0612ifthet[25]distributioniscorrect,andsomeothervalueifthedisturbancesarenotnormallydistributed.Theendresultisthatthestandardt-testretainsalargesamplevalidity.Littlecanbesaidaboutthetruesizeofatestbasedonthetdistributionunlessonemakessomeotherequallynarrowassumptionaboutε,butthetdistributionisgenerallyusedasareliableapproximation.WewillusethesameapproachtoanalyzetheFstatisticfortestingasetofJlinearrestrictions.Step1willbetoshowthatwithnormallydistributeddisturbances,JFconvergestoachi-squaredvariableasthesamplesizeincreases.Wewillthenshowthatthisresultisactuallyindependentofthenormalityofthedisturbances;itreliesonthecentrallimittheorem.Finally,weconsider,asabove,theappropriatecriticalvaluestouseforthisteststatistic,whichonlyhaslargesamplevalidity.TheFstatisticfortestingthevalidityofJlinearrestrictions,Rβ−q=0,isgivenin(6-6).Withnormallydistributeddisturbancesandunderthenullhypothesis,theexactdistributionofthisstatisticisF[J,n−K].ToseehowFbehavesmoregenerally,dividethenumeratoranddenominatorin(6-6)byσ2andrearrangethefractionslightly,so(Rb−q)R[σ2(XX)−1]R−1(Rb−q)F=.(6-23)J(s2/σ2)Sinceplims2=σ2,andplim(XX/n)=Q,thedenominatorofFconvergestoJandthebracketedterminthenumeratorwillbehavethesameas(σ2/n)RQ−1R.Hence,regardlessofwhatthisdistributionis,ifFhasalimitingdistribution,thenitisthesameasthelimitingdistributionof1W∗=(Rb−q)[R(σ2/n)Q−1R]−1(Rb−q)J1−1=(Rb−q)Asy.Var[Rb−q](Rb−q).JThisexpressionis(1/J)timesaWaldstatistic,basedontheasymptoticdistribution.Thelarge-sampledistributionofW∗willbethatof(1/J)timesachi-squaredwithJdegreesoffreedom.Itfollowsthatwithnormallydistributeddisturbances,JFconvergestoachi-squaredvariatewithJdegreesoffreedom.Theproofisinstructive.[SeeWhite(2001,9.76).]\nGreene-50240bookJune3,200210:1CHAPTER6✦InferenceandPrediction107THEOREM6.1LimitingDistributionoftheWaldStatistic√dIfn(b−β)−→N[0,σ2Q−1]andifH:Rβ−q=0istrue,then02−1−1d2W=(Rb−q){Rs(XX)R}(Rb−q)=JF−→χ[J].Proof:SinceRisamatrixofconstantsandRβ=q,√√d2−1nR(b−β)=n(Rb−q)−→N[0,R(σQ)R].(1)Forconvenience,writethisequationasdz−→N[0,P].(2)InSectionA.6.11,wedefinetheinversesquarerootofapositivedefinitematrixPasanothermatrix,sayTsuchthatT2=P−1,anddenoteTasP−1/2.LetTbetheinversesquarerootofP.Then,bythesamereasoningasin(1)and(2),d−1/2d−1/2−1/2ifz−→N[0,P],thenPz−→N[0,PPP]=N[0,I].(3)WenowinvokeTheoremD.21forthelimitingdistributionofafunctionofarandomvariable.Thesumofsquaresofuncorrelated(i.e.,independent)standardnormalvariablesisdistributedaschi-squared.Thus,thelimitingdistributionof−1/2−1/2−1d2(Pz)(Pz)=zPz−→χ(J).(4)Reassemblingthepartsfrombefore,wehaveshownthatthelimitingdistributionofn(Rb−q)[R(σ2Q−1)R]−1(Rb−q)(5)ischi-squared,withJdegreesoffreedom.NotethesimilarityofthisresulttotheresultsofSectionB.11.6.Finally,if−11plims2XX=σ2Q−1,(6)nthenthestatisticobtainedbyreplacingσ2Q−1bys2(XX/n)−1in(5)hasthesamelimitingdistribution.Thenscancel,andweareleftwiththesameWaldstatisticwelookedatbefore.Thisstepcompletestheproof.TheappropriatecriticalvaluesfortheFtestoftherestrictionsRβ−q=0con-vergefromaboveto1/Jtimesthoseforachi-squaredtestbasedontheWaldstatistic(seetheAppendixtables).Forexample,fortestingJ=5restrictions,thecriticalvaluefromthechi-squaredtable(AppendixTableG.4)for95percentsignificanceis11.07.ThecriticalvaluesfromtheFtable(AppendixTableG.5)are3.33=16.65/5forn−K=10,2.60=13.00/5forn−K=25,2.40=12.00/5forn−K=50,2.31=11.55/5forn−K=100,and2.214=11.07/5forlargen−K.Thus,withnormallydistributeddisturbances,asngetslarge,theFtestcanbecarriedoutbyreferringJFtothecriticalvaluesfromthechi-squaredtable.\nGreene-50240bookJune3,200210:1108CHAPTER6✦InferenceandPredictionThecrucialresultforourpurposeshereisthatthedistributionoftheWaldstatisticisbuiltupfromthedistributionofb,whichisasymptoticallynormalevenwithoutnormallydistributeddisturbances.Theimplicationisthatanappropriatelargesampleteststatisticischi-squared=JF.Onceagain,thisimplicationreliesonthecentrallimittheorem,notonnormallydistributeddisturbances.Now,whatistheappropriateapproachforasmallormoderatelysizedsample?Aswesawearlier,thecriticalvaluesfortheFdistributionconvergefromaboveto(1/J)timesthosefortheprecedingchi-squareddistribution.Asbefore,onecannotsaythatthiswillalwaysbetrueineverycaseforeverypossibleconfigurationofthedataandparameters.Withoutsomespecialconfigurationofthedataandparameters,however,one,canexpectittooccurgenerally.Theimplicationisthatabsentsomeadditionalfirmcharacterizationofthemodel,theFstatistic,withthecriticalvaluesfromtheFtable,remainsaconservativeapproachthatbecomesmoreaccurateasthesamplesizeincreases.Exercise7attheendofthischaptersuggestsanotherapproachtotestingthathasvalidityinlargesamples,aLagrangemultipliertest.ThevectorofLagrangemultipliersin(6-14)is[R(XX)−1R]−1(Rb−q),thatis,amultipleoftheleastsquaresdiscrepancyvector.Inprinciple,atestofthehypothesisthatλequalszeroshouldbeequivalenttoatestofthenullhypothesis.Sincetheleadingmatrixhasfullrank,thiscanonlyequalzeroifthediscrepancyequalszero.AWaldtestofthehypothesisthatλ=0isindeedavalidwaytoproceed.ThelargesampledistributionoftheWaldstatisticwouldbechi-squaredwithJdegreesoffreedom.(TheprocedureisconsideredinExercise7.)Forasetofexclusionrestrictions,β2=0,thereisasimplewaytocarryoutthistest.Thechi-squaredstatistic,inthiscasewithKdegreesoffreedomcanbecomputedasnR2in2theregressionofe∗(theresidualsintheshortregression)onthefullsetofindependentvariables.6.5TESTINGNONLINEARRESTRICTIONSTheprecedingdiscussionhasreliedheavilyonthelinearityoftheregressionmodel.Whenweanalyzenonlinearfunctionsoftheparametersandnonlinearregressionmodels,mostoftheseexactdistributionalresultsnolongerhold.Thegeneralproblemisthatoftestingahypothesisthatinvolvesanonlinearfunctionoftheregressioncoefficients:H0:c(β)=q.Weshalllookfirstatthecaseofasinglerestriction.Themoregeneralone,inwhichc(β)=qisasetofrestrictions,isasimpleextension.Thecounterparttotheteststatisticweusedearlierwouldbec(βˆ)−qz=(6-24)estimatedstandarderrororitssquare,whichintheprecedingweredistributedast[n−K]andF[1,n−K],respectively.Thediscrepancyinthenumeratorpresentsnodifficulty.Obtaininganestimateofthesamplingvarianceofc(βˆ)−q,however,involvesthevarianceofanonlinearfunctionofβˆ.\nGreene-50240bookJune3,200210:1CHAPTER6✦InferenceandPrediction109TheresultsweneedforthiscomputationarepresentedinSectionsB.10.3andD.3.1.AlinearTaylorseriesapproximationtoc(βˆ)aroundthetrueparametervectorβis∂c(β)c(βˆ)≈c(β)+(βˆ−β).(6-25)∂βWemustrelyonconsistencyratherthanunbiasednesshere,since,ingeneral,theex-pectedvalueofanonlinearfunctionisnotequaltothefunctionoftheexpectedvalue.Ifplimβˆ=β,thenwearejustifiedinusingc(βˆ)asanestimateofc(β).(Therele-vantresultistheSlutskytheorem.)Assumingthatouruseofthisapproximationisappropriate,thevarianceofthenonlinearfunctionisapproximatelyequaltothevari-anceoftheright-handside,whichis,then,∂c(β)∂c(β)Var[c(βˆ)]≈Var[βˆ].(6-26)∂β∂βThederivativesintheexpressionforthevariancearefunctionsoftheunknownparam-eters.Sincethesearebeingestimated,weuseoursampleestimatesincomputingthederivatives.Toestimatethevarianceoftheestimator,wecanuses2(XX)−1.Finally,werelyonTheoremD.2.2inSectionD.3.1andusethestandardnormaldistributioninsteadofthetdistributionfortheteststatistic.Usingg(βˆ)toestimateg(β)=∂c(β)/∂β,wecannowtestahypothesisinthesamefashionwedidearlier.Example6.3ALong-RunMarginalPropensitytoConsumeAconsumptionfunctionthathasdifferentshort-andlong-runmarginalpropensitiestocon-sumecanbewrittenintheformlnCt=α+βlnYt+γlnCt−1+εt,whichisadistributedlagmodel.Inthismodel,theshort-runmarginalpropensitytoconsume(MPC)(elasticity,sincethevariablesareinlogs)isβ,andthelong-runMPCisδ=β/(1−γ).Considertestingthehypothesisthatδ=1.QuarterlydataonaggregateU.S.consumptionanddisposablepersonalincomefortheyears1950to2000aregiveninAppendixTableF5.1.Theestimatedequationbasedonthesedatais2lnCt=0.003142+0.07495lnYt+0.9246lnCt−1+et,R=0.999712,s=0.00874(0.01055)(0.02873)(0.02859)Estimatedstandarderrorsareshowninparentheses.WewillalsorequireEst.Asy.Cov[b,c]=−0.0003298.Theestimateofthelong-runMPCisd=b/(1−c)=0.07495/(1−0.9246)=0.99403.Tocomputetheestimatedvarianceofd,wewillrequire∂d1∂dbgb===13.2626,gc===13.1834.∂b1−c∂c(1−c)2Theestimatedasymptoticvarianceofdis22Est.Asy.Var[d]=gbEst.Asy.Var[b]+gcEst.Asy.Var[c]+2gbgcEst.Asy.Cov[b,c]2222=13.2626×0.02873+13.1834×0.02859+2(13.2626)(13.1834)(−0.0003298)=0.17192.\nGreene-50240bookJune3,200210:1110CHAPTER6✦InferenceandPredictionThesquarerootis0.41464.Totestthehypothesisthatthelong-runMPCisgreaterthanorequalto1,wewoulduse0.99403−1z==−0.0144.0.41464Becauseweareusingalargesampleapproximation,werefertoastandardnormaltableinsteadofthetdistribution.Thehypothesisthatγ=1isnotrejected.Youmayhavenoticedthatwecouldhavetestedthishypothesiswithalinearrestrictioninstead;ifδ=1,thenβ=1−γ,orβ+γ=1.Theestimateisq=b+c−1=−0.00045.Theestimatedstandarderrorofthislinearfunctionis[0.028732+0.028592−2(0.0003298)]1/2=0.03136.Thetratioforthistestis−0.01435whichisthesameasbefore.Sincethesampleusedhereisfairlylarge,thisistobeexpected.However,thereisnothinginthecomputationsthatassuresthisoutcome.Inasmallersample,wemighthaveobtainedadifferentanswer.Forexample,usingthelast11yearsofthedata,thetstatisticsforthetwohypothesesare7.652and5.681.TheWaldtestisnotinvarianttohowthehypothesisisformulated.Inaborderlinecase,wecouldhavereachedadifferentconclusion.ThislackofinvariancedoesnotoccurwiththelikelihoodratioorLagrangemultipliertestsdiscussedinChapter17.Ontheotherhand,bothofthesetestsrequireanassumptionofnormality,whereastheWaldstatisticdoesnot.Thisillustratesoneofthetrade-offsbetweenamoredetailedspecificationandthepowerofthetestproceduresthatareimplied.Thegeneralizationtomorethanonefunctionoftheparametersproceedsalongsimilarlines.Letc(βˆ)beasetofJfunctionsoftheestimatedparametervectorandlettheJ×Kmatrixofderivativesofc(βˆ)be∂c(βˆ)Gˆ=.(6-27)∂βˆTheestimateoftheasymptoticcovariancematrixofthesefunctionsisEst.Asy.Var[cˆ]=GˆEst.Asy.Var[βˆ]Gˆ.(6-28)ThejthrowofGisKderivativesofcjwithrespecttotheKelementsofβˆ.Forexample,thecovariancematrixforestimatesoftheshort-andlong-runmarginalpropensitiestoconsumewouldbeobtainedusing010G=2.01/(1−γ)β/(1−γ)ThestatisticfortestingtheJhypothesesc(β)=qis−1W=(cˆ−q)Est.Asy.Var[cˆ](cˆ−q).(6-29)Inlargesamples,Whasachi-squareddistributionwithdegreesoffreedomequaltothenumberofrestrictions.Notethatforasinglerestriction,thisvalueisthesquareofthestatisticin(6-24).\nGreene-50240bookJune3,200210:1CHAPTER6✦InferenceandPrediction1116.6PREDICTIONAftertheestimationofparameters,acommonuseofregressionisforprediction.8Supposethatwewishtopredictthevalueofy0associatedwitharegressorvectorx0.Thisvaluewouldbey0=x0β+ε0.ItfollowsfromtheGauss–Markovtheoremthatyˆ0=x0b(6-30)istheminimumvariancelinearunbiasedestimatorofE[y0|x0].Theforecasterrorise0=y0−yˆ0=(β−b)x0+ε0.ThepredictionvariancetobeappliedtothisestimateisVar[e0|X,x0]=σ2+Var[(β−b)x0|X,x0]=σ2+x0[σ2(XX)−1]x0.(6-31)Iftheregressioncontainsaconstantterm,thenanequivalentexpressionis1K−1K−1Var[e0]=σ21++x0−x¯x0−x¯(ZM0Z)jkjjkknj=1k=1whereZistheK−1columnsofXnotincludingtheconstant.Thisresultshowsthatthewidthoftheintervaldependsonthedistanceoftheelementsofx0fromthecenterofthedata.Intuitively,thisideamakessense;thefarthertheforecastedpointisfromthecenterofourexperience,thegreateristhedegreeofuncertainty.Thepredictionvariancecanbeestimatedbyusings2inplaceofσ2.Aconfidenceintervalfory0wouldbeformedusingapredictioninterval=yˆ0±tse(e0).λ/2Figure6.1showstheeffectforthebivariatecase.Notethatthepredictionvarianceiscomposedofthreeparts.Thesecondandthirdbecomeprogressivelysmallerasweaccumulatemoredata(i.e.,asnincreases).Butthefirsttermσ2isconstant,whichimpliesthatnomatterhowmuchdatawehave,wecanneverpredictperfectly.Example6.4PredictionforInvestmentSupposethatwewishto“predict”thefirstquarter2001valueofrealinvestment.Theaveragerate(secondarymarket)forthe90dayT-billwas4.48%(downfrom6.03attheendof2000);realGDPwas9316.8;theCPIUwas528.0andthetimetrendwouldequal204.(Wedroppedoneobservationtocomputetherateofinflation.Datawereobtainedfromwww.economagic.com.)Therateofinflationonayearlybasiswouldbe8Itisnecessaryatthispointtomakealargelysemanticdistinctionbetween“prediction”and“forecasting.”Wewillusetheterm“prediction”tomeanusingtheregressionmodeltocomputefittedvaluesofthedependentvariable,eitherwithinthesampleorforobservationsoutsidethesample.Thesamesetofresultswillapplytocrosssections,timeseries,orpanels.Thesearethemethodsconsideredinthissection.Itishelpfulatthispointtoreservetheterm“forecasting”forusageofthetimeseriesmodelsdiscussedinChapter20.Oneofthedistinguishingfeaturesofthemodelsinthatsettingwillbetheexplicitroleof“time”andthepresenceoflaggedvariablesanddisturbancesintheequationsandcorrelationofvariableswithpastvalues.\nGreene-50240bookJune3,200210:1112CHAPTER6✦InferenceandPredictionyyˆyababxxFIGURE6.1PredictionIntervals.100%×4×ln(528.0/521.1)=5.26%.ThedatavectorforpredictinglnI2001.1wouldbex0=[1,4.48,5.26,9.1396,204].UsingtheregressionresultsinExample6.1,0xb=[1,4.48,5.26,9.1396,204]×[−9.1345,−0.008601,0.003308,1.9302,−0.005659]=7.3312.Theestimatedvarianceofthispredictioniss2[1+x0(XX)−1x0]=0.0076912.(6-32)Thesquareroot,0.087699,givesthepredictionstandarddeviation.Usingthisvalue,weobtainthepredictioninterval:7.3312±1.96(0.087699)=7.1593,7.5031.Theyearlyrateofrealinvestmentinthefirstquarterof2001was1721.Thelogis7.4507,soourforecastintervalcontainstheactualvalue.Wehaveforecastedthelogofrealinvestmentwithourregressionmodel.Ifitisdesiredtoforecastthelevel,thenaturalestimatorwouldbeIˆ=exp(lnI).Assumingthattheestimator,itself,isatleastasymptoticallynormallydistributed,thisshouldsystematicallyunderestimatethelevelbyafactorofexp(σˆ2/2)basedonthemeanofthelognormaldistribution.[SeeWooldridge(2000,p.203)andSectionB.4.4.]Itremainstodeterminewhattouseforσˆ2.In(6-32),thesecondpartoftheexpressionwillvanishinlargesamples,leaving(asWooldridgesuggests)s2=0.007427.9Usingthisscaling,weobtainapredictionof1532.9,whichisstill11percentbelowtheactualvalue.Evidently,thismodelbasedonanextremelylongtimeseriesdoesnotdoaverygoodjobofpredictingattheendofthesampleperiod.Onemightsurmisevariousreasons,includingsomerelatedtothemodelspecificationthatwewilladdressinChapter20,butasafirstguess,itseemsoptimistictoapplyanequationthissimpletomorethan50yearsofdatawhileexpectingtheunderlyingstructuretobeunchanging9Wooldridgesuggestsanalternativenotnecessarilybasedonanassumptionofnormality.Useasthescalefactorthesinglecoefficientinawithinsampleregressionofyiontheexponentsofthefittedlogs.\nGreene-50240bookJune3,200210:1CHAPTER6✦InferenceandPrediction113throughtheentireperiod.Toinvestigatethispossibility,weredidalltheprecedingcalculationsusingonlythedatafrom1990to2000fortheestimation.Thepredictionforthelevelofinvestmentin2001.1isnow1885.2(usingthesuggestedscaling),whichisanoverestimateof9.54percent.But,thisismoreeasilyexplained.Thefirstquarterof2001beganthefirstrecessionintheU.S.economyinnearly10years,andoneoftheearlysymptomsofarecessionisarapiddeclineinbusinessinvestment.Alltheprecedingassumesthatx0iseitherknownwithcertainty,expost,orfore-castedperfectly.Ifx0must,itself,beforecasted(anexanteforecast),thentheformulafortheforecastvariancein(6-31)wouldhavetobemodifiedtoincludethevariationinx0,whichgreatlycomplicatesthecomputation.Mostauthorsviewitassimplyin-tractable.BeginningwithFeldstein(1971),derivationoffirmanalyticalresultsforthecorrectforecastvarianceforthiscaseremaintobederivedexceptforsimplespecialcases.Theonequalitativeresultthatseemscertainisthat(6-31)willunderstatethetruevariance.McCullough(1996)presentsanalternativeapproachtocomputingap-propriateforecaststandarderrorsbasedonthemethodofbootstrapping.(SeetheendofSection16.3.2.)Variousmeasureshavebeenproposedforassessingthepredictiveaccuracyoffore-castingmodels.10Mostofthesemeasuresaredesignedtoevaluateexpostforecasts,thatis,forecastsforwhichtheindependentvariablesdonotthemselveshavetobefore-casted.Twomeasuresthatarebasedontheresidualsfromtheforecastsaretherootmeansquarederror1RMSE=(yi−yˆi)2n0iandthemeanabsoluteerror1MAE=0|yi−yˆi|,niwheren0isthenumberofperiodsbeingforecasted.(Notethatbothoftheseaswellasthemeasuresbelow,arebackwardlookinginthattheyarecomputedusingtheobserveddataontheindependentvariable.)Thesestatisticshaveanobviousscalingproblem—multiplyingvaluesofthedependentvariablebyanyscalarmultipliesthemeasurebythatscalaraswell.SeveralmeasuresthatarescalefreearebasedontheTheilUstatistic:11(1/n0)(yi−yˆi)2U=i.(1/n0)y2iiThismeasureisrelatedtoR2butisnotboundedbyzeroandone.Largevaluesindicateapoorforecastingperformance.Analternativeistocomputethemeasureintermsofthechangesiny:(1/n0)(yi−yˆi)2U=i,(1/n0)(y)2ii10SeeTheil(1961)andFair(1984).11Theil(1961).\nGreene-50240bookJune3,200210:1114CHAPTER6✦InferenceandPredictionwhereyi=yi−yi−1andyˆi=yˆi−yi−1,or,inpercentagechanges,yi=(yi−yi−1)/yi−1andyˆi=(yˆi−yi−1)/yi−1.Thesemeasureswillreflectthemodel’sabilitytotrackturningpointsinthedata.6.7SUMMARYANDCONCLUSIONSThischapterhasfocusedontwousesofthelinearregressionmodel,hypothesistestingandbasicprediction.ThecentralresultfortestinghypothesesistheFstatistic.TheFratiocanbeproducedintwoequivalentways;first,bymeasuringtheextenttowhichtheunrestrictedleastsquaresestimatediffersfromwhatahypothesiswouldpredictandsecond,bymeasuringthelossoffitthatresultsfromassumingthatahypothesisiscorrect.WethenextendedtheFstatistictomoregeneralsettingsbyexaminingitslargesampleproperties,whichallowustodiscardtheassumptionofnormallydistributeddisturbancesandbyextendingittononlinearrestrictions.KeyTermsandConcepts•Alternativehypothesis•Nestedmodels•Predictionvariance•Distributedlag•Nonlinearrestriction•Restrictedleastsquares•Discrepancyvector•Nonnestedmodels•Rootmeansquarederror•Exclusionrestrictions•NoninvarianceofWaldtest•Testableimplications•Expostforecast•Nonnormality•TheilUstatistic•Lagrangemultipliertest•Nullhypothesis•Waldcriterion•Limitingdistribution•Parameterspace•Linearrestrictions•PredictionintervalExercises1.Amultipleregressionofyonaconstantx1andx2producesthefollowingresults:yˆ=4+0.4x+0.9x,R2=8/60,ee=520,n=29,122900XX=05010.01080Testthehypothesisthatthetwoslopessumto1.2.UsingtheresultsinExercise1,testthehypothesisthattheslopeonx1is0byrunningtherestrictedregressionandcomparingthetwosumsofsquareddeviations.3.Theregressionmodeltobeanalyzedisy=X1β1+X2β2+ε,whereX1andX2haveK1andK2columns,respectively.Therestrictionisβ2=0.a.Using(6-14),provethattherestrictedestimatorissimply[b1∗,0],whereb1∗istheleastsquarescoefficientvectorintheregressionofyonX1.00b.Provethatiftherestrictionisβ2=β2foranonzeroβ2,thentherestrictedestimatorofβisb=(XX)−1X(y−Xβ0).11∗111224.Theexpressionfortherestrictedcoefficientvectorin(6-14)maybewrittenintheformb∗=[I−CR]b+w,wherewdoesnotinvolveb.WhatisC?Showthatthe\nGreene-50240bookJune3,200210:1CHAPTER6✦InferenceandPrediction115covariancematrixoftherestrictedleastsquaresestimatorisσ2(XX)−1−σ2(XX)−1R[R(XX)−1R]−1R(XX)−1andthatthismatrixmaybewrittenasVar[b|X][Var(b|X)]−1−R[Var(Rb)|X]−1RVar[b|X].5.Provetheresultthattherestrictedleastsquaresestimatorneverhasalargercovariancematrixthantheunrestrictedleastsquaresestimator.6.ProvetheresultthattheR2associatedwitharestrictedleastsquaresestimatorisneverlargerthanthatassociatedwiththeunrestrictedleastsquaresestimator.Concludethatimposingrestrictionsneverimprovesthefitoftheregression.7.TheLagrangemultipliertestofthehypothesisRβ−q=0isequivalenttoaWaldtestofthehypothesisthatλ=0,whereλisdefinedin(6-14).Provethatee2−1∗∗χ=λEst.Var[λ]λ=(n−K)−1.eeNotethatthefractioninbracketsistheratiooftwoestimatorsofσ2.Byvirtueof(6-19)andtheprecedingdiscussion,weknowthatthisratioisgreaterthan1.Finally,provethattheLagrangemultiplierstatisticisequivalenttoJF,whereJisthenumberofrestrictionsbeingtestedandFistheconventionalFstatisticgivenin(6-6).8.UsetheLagrangemultipliertesttotestthehypothesisinExercise1.9.UsingthedataandmodelofExample2.3,carryoutatestofthehypothesisthatthethreeaggregatepriceindicesarenotsignificantdeterminantsofthedemandforgasoline.10.ThefullmodelofExample2.3maybewritteninlogarithmictermsaslnG/pop=α+βplnPg+βylnY+γnclnPnc+γuclnPuc+γptlnPpt+βyear+δdlnPd+δnlnPn+δslnPs+ε.Considerthehypothesisthatthemicroelasticitiesareaconstantproportionoftheelasticitywithrespecttotheircorrespondingaggregate.Thus,forsomepositiveθ(presumablybetween0and1),γnc=θδd,γuc=θδd,γpt=θδs.Thefirsttwoimplythesimplelinearrestrictionγnc=γuc.Bytakingratios,thefirst(orsecond)andthirdimplythenonlinearrestrictionγncδd=orγncδs−γptδd=0.γptδsa.Describeindetailhowyouwouldtestthevalidityoftherestriction.b.UsingthegasolinemarketdatainTableF2.2,testtherestrictionsseparatelyandjointly.11.ProvethatunderthehypothesisthatRβ=q,theestimator(y−Xb)(y−Xb)2∗∗s∗=,n−K+JwhereJisthenumberofrestrictions,isunbiasedforσ2.12.Showthatinthemultipleregressionofyonaconstant,x1andx2whileimposingtherestrictionβ1+β2=1leadstotheregressionofy−x1onaconstantandx2−x1.\nGreene-50240bookJune11,200218:467FUNCTIONALFORMANDSTRUCTURALCHANGEQ7.1INTRODUCTIONInthischapter,weareconcernedwiththefunctionalformoftheregressionmodel.Manydifferenttypesoffunctionsare“linear”bythedefinitionconsideredinSection2.3.1.Byusingdifferenttransformationsofthedependentandindependentvariables,dummyvariablesanddifferentarrangementsoffunctionsofvariables,awidevarietyofmodelscanbeconstructedthatareallestimablebylinearleastsquares.Section7.2considersusingbinaryvariablestoaccommodatenonlinearitiesinthemodel.Section7.3broadenstheclassofmodelsthatarelinearintheparameters.Sections7.4and7.5thenexaminetheissueofspecifyingandtestingforchangeintheunderlyingmodelthatgeneratesthedata,undertheheadingofstructuralchange.7.2USINGBINARYVARIABLESOneofthemostusefuldevicesinregressionanalysisisthebinary,ordummyvariable.Adummyvariabletakesthevalueoneforsomeobservationstoindicatethepres-enceofaneffectormembershipinagroupandzerofortheremainingobservations.Binaryvariablesareaconvenientmeansofbuildingdiscreteshiftsofthefunctionintoaregressionmodel.7.2.1BINARYVARIABLESINREGRESSIONDummyvariablesareusuallyusedinregressionequationsthatalsocontainotherquan-titativevariables.IntheearningsequationinExample4.3,weincludedavariableKidstoindicatewhethertherewerechildreninthehouseholdundertheassumptionthatformanymarriedwomen,thisfactisasignificantconsiderationinlaborsupplybehavior.TheresultsshowninExample7.1appeartobeconsistentwiththishypothesis.Example7.1DummyVariableinanEarningsEquationTable7.1followingreproducestheestimatedearningsequationinExample4.3.ThevariableKidsisadummyvariable,whichequalsoneiftherearechildrenunder18inthehouseholdandzerootherwise.Sincethisisasemilogequation,thevalueof−.35forthecoefficientisanextremelylargeeffect,thatsuggeststhatallotherthingsequal,theearningsofwomenwithchildrenarenearlyathirdlessthanthosewithout.Thisisalargedifference,butonethatwouldcertainlymeritcloserscrutiny.Whetherthiseffectresultsfromdifferentlabormarketeffectswhichaffectwagesandnothours,orthereverse,remainstobeseen.Second,havingchosenanonrandomlyselectedsampleofthosewithonlypositiveearningstobeginwith,itisunclearwhetherthesamplingmechanismhas,itself,inducedabiasinthiscoefficient.116\nGreene-50240bookJune11,200218:46CHAPTER7✦FunctionalFormandStructuralChange117TABLE7.1EstimatedEarningsEquationlnearnings=β+βage+βage2+βeducation+βkids+ε12345Sumofsquaredresiduals:599.4582Standarderroroftheregression:1.19044R2basedon428observations0.040995VariableCoefficientStandardErrortRatioConstant3.240091.76741.833Age0.200560.083862.392Age2−0.00231470.00098688−2.345Education0.0674720.0252482.672Kids−0.351190.14753−2.380Inrecentapplications,researchersinmanyfieldshavestudiedtheeffectsoftreat-mentonsomekindofresponse.Examplesincludetheeffectofcollegeon,lifetimeincome,sexdifferencesinlaborsupplybehaviorasinExample7.1,andinsalarystruc-turesinindustries,andinpre-versuspostregimeshiftsinmacroeconomicmodels,tonamebutafew.Theseexamplescanallbeformulatedinregressionmodelsinvolvingasingledummyvariable:y=xβ+δd+ε.iiiiOneoftheimportantissuesinpolicyanalysisconcernsmeasurementofsuchtreatmenteffectswhenthedummyvariableresultsfromanindividualparticipationdecision.Forexample,instudiesoftheeffectofjobtrainingprogramsonpost-trainingearnings,the“treatmentdummy”mightbemeasuringthelatentmotivationandinitiativeoftheparticipantsratherthantheeffectoftheprogram,itself.WewillrevisitthissubjectinSection22.4.Itiscommonforresearcherstoincludeadummyvariableinaregressiontoaccountforsomethingthatappliesonlytoasingleobservation.Forexample,intime-seriesanalyses,anoccasionalstudyincludesadummyvariablethatisoneonlyinasingleunusualyear,suchastheyearofamajorstrikeoramajorpolicyevent.(See,forexample,theapplicationtotheGermanmoneydemandfunctioninSection20.6.5.)Itiseasytoshow(weconsiderthisintheexercises)theveryusefulimplicationofthis:Adummyvariablethattakesthevalueoneonlyforoneobservationhastheeffectofdeletingthatobservationfromcomputationoftheleastsquaresslopesandvarianceestimator(butnotR-squared).7.2.2SEVERALCATEGORIESWhenthereareseveralcategories,asetofbinaryvariablesisnecessary.Correctingforseasonalfactorsinmacroeconomicdataisacommonapplication.WecouldwriteaconsumptionfunctionforquarterlydataasCt=β1+β2xt+δ1Dt1+δ2Dt2+δ3Dt3+εt,\nGreene-50240bookJune11,200218:46118CHAPTER7✦FunctionalFormandStructuralChangewherextisdisposableincome.Notethatonlythreeofthefourquarterlydummyvari-ablesareincludedinthemodel.Ifthefourthwereincluded,thenthefourdummyvariableswouldsumtooneateveryobservation,whichwouldreproducetheconstantterm—acaseofperfectmulticollinearity.Thisisknownasthedummyvariabletrap.Thus,toavoidthedummyvariabletrap,wedropthedummyvariableforthefourthquarter.(Dependingontheapplication,itmightbepreferabletohavefourseparatedummyvariablesanddroptheoverallconstant.)1Anyofthefourquarters(or12months)canbeusedasthebaseperiod.Theprecedingisameansofdeseasonalizingthedata.Considerthealternativeformulation:Ct=βxt+δ1Dt1+δ2Dt2+δ3Dt3+δ4Dt4+εt.(7-1)UsingtheresultsfromChapter3onpartitionedregression,weknowthattheprecedingmultipleregressionisequivalenttofirstregressingCandxonthefourdummyvariablesandthenusingtheresidualsfromtheseregressionsinthesubsequentregressionofdeseasonalizedconsumptionondeseasonalizedincome.Clearly,deseasonalizinginthisfashionpriortocomputingthesimpleregressionofconsumptiononincomeproducesthesamecoefficientonincome(andthesamevectorofresiduals)asincludingthesetofdummyvariablesintheregression.7.2.3SEVERALGROUPINGSThecaseinwhichseveralsetsofdummyvariablesareneededismuchthesameasthosewehavealreadyconsidered,withoneimportantexception.Consideramodelofstatewidepercapitaexpenditureoneducationyasafunctionofstatewidepercapitaincomex.Supposethatwehaveobservationsonalln=50statesforT=10years.Aregressionmodelthatallowstheexpectedexpendituretochangeovertimeaswellasacrossstateswouldbeyit=α+βxit+δi+θt+εit.(7-2)Asbefore,itisnecessarytodroponeofthevariablesineachsetofdummyvariablestoavoidthedummyvariabletrap.Forourexample,ifatotalof50statedummiesand10timedummiesisretained,aproblemof“perfectmulticollinearity”remains;thesumsofthe50statedummiesandthe10timedummiesarethesame,thatis,1.Oneofthevariablesineachofthesets(ortheoverallconstanttermandoneofthevariablesinoneofthesets)mustbeomitted.Example7.2AnalysisofCovarianceThedatainAppendixTableF7.1wereusedinastudyofefficiencyinproductionofairlineservicesinGreene(1997b).Theairlineindustryhasbeenafavoritesubjectofstudy[e.g.,SchmidtandSickles(1984);Sickles,Good,andJohnson(1986)],partlybecauseofinterestinthisrapidlychangingmarketinaperiodofderegulationandpartlybecauseofanabundanceoflarge,high-qualitydatasetscollectedbythe(nolongerexistent)CivilAeronauticsBoard.Theoriginaldatasetconsistedof25firmsobservedyearlyfor15years(1970to1984),a“balancedpanel.”Severalofthefirmsmergedduringthisperiodandseveralothersexperi-encedstrikes,whichreducedthenumberofcompleteobservationssubstantially.Omittingtheseandothersbecauseofmissingdataonsomeofthevariablesleftagroupof10full1SeeSuits(1984)andGreeneandSeaks(1991).\nGreene-50240bookJune11,200218:46CHAPTER7✦FunctionalFormandStructuralChange119EstimatedYearSpecificEffects.1.0.1.2.3(Year).4.5.6.7.81969197419791984YearFIGURE7.1EstimatedYearDummyVariableCoefficients.observations,fromwhichwehaveselectedsixfortheexamplestofollow.Wewillfitacostequationoftheform2lnCi,t=β1+β2lnQi,t+β3lnQi,t+β4lnPfueli,t+β5Loadfactori,t145+θtDi,t+δiFi,t+εi,t.t=1i=1ThedummyvariablesareDi,twhichistheyearvariableandFi,twhichisthefirmvariable.Wehavedroppedthelastoneineachgroup.Theestimatedmodelforthefullspecificationis2lnCi,t=13.56+.8866lnQi,t+0.01261lnQi,t+0.1281lnPfi,t−0.8855LFi,t+timeeffects+firmeffects.Theyeareffectsdisplayarevealingpattern,asshowninFigure7.1.Thiswasaperiodofrapidlyrisingfuelprices,sothecosteffectsaretobeexpected.Sinceoneyeardummyvariableisdropped,theeffectshownisrelativetothisbaseyear(1984).Weareinterestedinwhetherthefirmeffects,thetimeeffects,both,orneitheraresta-tisticallysignificant.Table7.2presentsthesumsofsquaresfromthefourregressions.TheFstatisticforthehypothesisthattherearenofirmspecificeffectsis65.94,whichishighlysignificant.Thestatisticforthetimeeffectsisonly2.61,whichislargerthanthecriticalvalueTABLE7.2FtestsforFirmandYearEffectsModelSumofSquaresParametersFDeg.Fr.FullModel0.1725724—TimeEffects1.034701965.94[5,66]FirmEffects0.26815102.61[14,66]NoEffects1.27492522.19[19,66]\nGreene-50240bookJune11,200218:46120CHAPTER7✦FunctionalFormandStructuralChangeof1.84,butperhapslesssothanFigure7.1mighthavesuggested.Intheabsenceoftheyearspecificdummyvariables,theyearspecificeffectsareprobablylargelyabsorbedbythepriceoffuel.7.2.4THRESHOLDEFFECTSANDCATEGORICALVARIABLESInmostapplications,weusedummyvariablestoaccountforpurelyqualitativefactors,suchasmembershipinagroup,ortorepresentaparticulartimeperiod.Therearecases,however,inwhichthedummyvariable(s)representslevelsofsomeunderlyingfactorthatmighthavebeenmeasureddirectlyifthiswerepossible.Forexample,educationisacaseinwhichwetypicallyobservecertainthresholdsratherthan,say,yearsofeducation.Suppose,forexample,thatourinterestisinaregressionoftheformincome=β1+β2age+effectofeducation+ε.Thedataoneducationmightconsistofthehighestlevelofeducationattained,suchashighschool(HS),undergraduate(B),master’s(M),orPh.D.(P).AnobviouslyunsatisfactorywaytoproceedistouseavariableEthatis0forthefirstgroup,1forthesecond,2forthethird,and3forthefourth.Thatis,income=β1+β2age+β3E+ε.Thedifficultywiththisapproachisthatitassumesthattheincrementinincomeateachthresholdisthesame;β3isthedifferencebetweenincomewithaPh.D.andamaster’sandbetweenamaster’sandabachelor’sdegree.Thisisunlikelyandundulyrestrictstheregression.Amoreflexiblemodelwouldusethree(orfour)binaryvariables,oneforeachlevelofeducation.Thus,wewouldwriteincome=β1+β2age+δBB+δMM+δPP+ε.ThecorrespondencebetweenthecoefficientsandincomeforagivenageisHighschool:E[income|age,HS]=β1+β2age,Bachelor’s:E[income|age,B]=β1+β2age+δB,Masters:E[income|age,M]=β1+β2age+δM,Ph.D.:E[income|age,P]=β1+β2age+δP.Thedifferencesbetween,say,δPandδMandbetweenδMandδBareofinterest.Obvi-ously,thesearesimpletocompute.Analternativewaytoformulatetheequationthatrevealsthesedifferencesdirectlyistoredefinethedummyvariablestobe1iftheindi-vidualhasthedegree,ratherthanwhetherthedegreeisthehighestdegreeobtained.Thus,forsomeonewithaPh.D.,allthreebinaryvariablesare1,andsoon.Bydefiningthevariablesinthisfashion,theregressionisnowHighschool:E[income|age,HS]=β1+β2age,Bachelor’s:E[income|age,B]=β1+β2age+δB,Masters:E[income|age,M]=β1+β2age+δB+δM,Ph.D.:E[income|age,P]=β1+β2age+δB+δM+δP.InsteadofthedifferencebetweenaPh.D.andthebasecase,inthismodelδPisthemarginalvalueofthePh.D.Howequationswithdummyvariablesareformulatedisamatterofconvenience.Alltheresultscanbeobtainedfromabasicequation.\nGreene-50240bookJune11,200218:46CHAPTER7✦FunctionalFormandStructuralChange121Income1822AgeFIGURE7.2SplineFunction.7.2.5SPLINEREGRESSIONIfoneisexaminingincomedataforalargecrosssectionofindividualsofvaryingagesinapopulation,thencertainpatternswithregardtosomeagethresholdswillbeclearlyevident.Inparticular,throughouttherangeofvaluesofage,incomewillberising,buttheslopemightchangeatsomedistinctmilestones,forexample,atage18,whenthetypicalindividualgraduatesfromhighschool,andatage22,whenheorshegraduatesfromcollege.ThetimeprofileofincomeforthetypicalindividualinthispopulationmightappearasinFigure7.2.Basedonthediscussionintheprecedingparagraph,wecouldfitsucharegressionmodeljustbydividingthesampleintothreesubsamples.However,thiswouldneglectthecontinuityoftheproposedfunction.Theresultwouldappearmorelikethedottedfigurethanthecontinuousfunctionwehadinmind.Restrictedregressionandwhatisknownasasplinefunctioncanbeusedtoachievethedesiredeffect.2ThefunctionwewishtoestimateisE[income|age]=α0+β0ageifage<18,α1+β1ageifage≥18andage<22,α2+β2ageifage≥22.Thethresholdvalues,18and22,arecalledknots.Letd=1ifage≥t∗,11d=1ifage≥t∗,222AnimportantreferenceonthissubjectisPoirier(1974).Anoften-citedapplicationappearsinGarberandPoirier(1974).\nGreene-50240bookJune11,200218:46122CHAPTER7✦FunctionalFormandStructuralChangewheret∗=18andt∗=22.Tocombineallthreeequations,weuse12income=β1+β2age+γ1d1+δ1d1age+γ2d2+δ2d2age+ε.(7-3)ThisrelationshipisthedashedfunctioninFigure7.2.Theslopesinthethreesegmentsareβ2,β2+δ1,andβ2+δ1+δ2.Tomakethefunctionpiecewisecontinuous,werequirethatthesegmentsjoinattheknots—thatis,β+βt∗=(β+γ)+(β+δ)t∗12111211and(β+γ)+(β+δ)t∗=(β+γ+γ)+(β+δ+δ)t∗.112121122122Thesearelinearrestrictionsonthecoefficients.Collectingterms,thefirstoneisγ+δt∗=0orγ=−δt∗.111111Doinglikewiseforthesecondandinsertingthesein(7-3),weobtainincome=β+βage+δd(age−t∗)+δd(age−t∗)+ε.12111222Constrainedleastsquaresestimatesareobtainablebymultipleregression,usingacon-stantandthevariablesx1=age,x2=age−18ifage≥18and0otherwise,andx3=age−22ifage≥22and0otherwise.Wecantestthehypothesisthattheslopeofthefunctionisconstantwiththejointtestofthetworestrictionsδ1=0andδ2=0.7.3NONLINEARITYINTHEVARIABLESItisusefulatthispointtowritethelinearregressionmodelinaverygeneralform:Letz=z1,z2,...,zLbeasetofLindependentvariables;letf1,f2,...,fKbeKlinearlyindependentfunctionsofz;letg(y)beanobservablefunctionofy;andretaintheusualassumptionsaboutthedisturbance.Thelinearregressionmodelisg(y)=β1f1(z)+β2f2(z)+···+βKfK(z)+ε=β1x1+β2x2+···+βKxK+ε(7-4)=xβ+ε.Byusinglogarithms,exponentials,reciprocals,transcendentalfunctions,polynomials,products,ratios,andsoon,this“linear”modelcanbetailoredtoanynumberofsituations.7.3.1FUNCTIONALFORMSAcommonlyusedformofregressionmodelistheloglinearmodel,lny=lnα+βklnXk+ε=β1+βkxk+ε.kk\nGreene-50240bookJune11,200218:46CHAPTER7✦FunctionalFormandStructuralChange123Inthismodel,thecoefficientsareelasticities:∂yxk∂lny==βk.(7-5)∂xky∂lnxkIntheloglinearequation,measuredchangesareinproportionalorpercentageterms;βkmeasuresthepercentagechangeinyassociatedwithaonepercentchangeinxk.Thisremovestheunitsofmeasurementofthevariablesfromconsiderationinusingtheregressionmodel.Analternativeapproachsometimestakenistomeasurethevari-ablesandassociatedchangesinstandarddeviationunits.Ifthedataare“standardized”beforeestimationusingx∗=(x−x¯)/sandlikewisefory,thentheleastsquaresikikkkregressioncoefficientsmeasurechangesinstandarddeviationunitsratherthannaturalorpercentageterms.(Notethattheconstanttermdisappearsfromthisregression.)Itisnotnecessaryactuallytotransformthedatatoproducetheseresults;multiplyingeachleastsquarescoefficientbkintheoriginalregressionbysy/skproducesthesameresult.Ahybridofthelinearandloglinearmodelsisthesemilogequationlny=β1+β2x+ε.(7-6)WeusedthisformintheinvestmentequationinSection6.2,lnIt=β1+β2(it−pt)+β3pt+β4lnYt+β5t+εt,wherethelogofinvestmentismodeledinthelevelsoftherealinterestrate,thepricelevel,andatimetrend.Inasemilogequationwithatimetrendsuchasthisone,dlnI/dt=β5istheaveragerateofgrowthofI.Theestimatedvalueof−.005inTable6.1suggeststhatoverthefullestimationperiod,afteraccountingforallotherfactors,theaveragerateofgrowthofinvestmentwas−.5percentperyear.Thecoefficientsinthesemilogmodelarepartial-orsemi-elasticities;in(7-6),β2is∂lny/∂x.ThisisanaturalformformodelswithdummyvariablessuchastheearningsequationinExample7.1.ThecoefficientonKidsof−.35suggeststhatallelseequal,earningsareapproximately35percentlesswhentherearechildreninthehousehold.ThequadraticearningsequationinExample7.1showsanotheruseofnonlineari-tiesinthevariables.UsingtheresultsinExample7.1,wefindthatforawomanwith12yearsofschoolingandchildreninthehousehold,theage-earningsprofileappearsasinFigure7.3.Thisfiguresuggestsanimportantquestioninthisframework.ItistemptingtoconcludethatFigure7.3showstheearningstrajectoryofapersonatdifferentages,butthatisnotwhatthedataprovide.Themodelisbasedonacrosssection,andwhatitdisplaysistheearningsofdifferentpeopleofdifferentages.Howthisprofilerelatestotheexpectedearningspathofoneindividualisadifferent,andcomplicatedquestion.Anotherusefulformulationoftheregressionmodelisonewithinteractionterms.Forexample,amodelrelatingbrakingdistanceDtospeedSandroadwetnessWmightbeD=β1+β2S+β3W+β4SW+ε.Inthismodel,∂E[D|S,W]=β2+β4W∂S\nGreene-50240bookJune11,200218:46124CHAPTER7✦FunctionalFormandStructuralChangeEarningsProfilebyAge3500300025002000Earnings15001000500202938475665AgeFIGURE7.3Age-EarningsProfile.whichimpliesthatthemarginaleffectofhigherspeedonbrakingdistanceisincreasedwhentheroadiswetter(assumingthatβ4ispositive).Ifitisdesiredtoformconfidenceintervalsortesthypothesesaboutthesemarginaleffects,thenthenecessarystandarderroriscomputedfrom∂Eˆ[D|S,W]Var=Var[βˆ]+W2Var[βˆ]+2WCov[βˆ,βˆ],2424∂Sandsimilarlyfor∂E[D|S,W]/∂W.AvaluemustbeinsertedforW.Thesamplemeanisanaturalchoice,butforsomepurposes,aspecificvalue,suchasanextremevalueofWinthisexample,mightbepreferred.7.3.2IDENTIFYINGNONLINEARITYIfthefunctionalformisnotknownapriori,thenthereareafewapproachesthatmayhelpatleasttoidentifyanynonlinearityandprovidesomeinformationaboutitfromthesample.Forexample,ifthesuspectednonlinearityiswithrespecttoasingleregressorintheequation,thenfittingaquadraticorcubicpolynomialratherthanalinearfunctionmaycapturesomeofthenonlinearity.Bychoosingseveralrangesfortheregressorinquestionandallowingtheslopeofthefunctiontobedifferentineachrange,apiecewiselinearapproximationtothenonlinearfunctioncanbefit.Example7.3FunctionalFormforaNonlinearCostFunctionInacelebratedstudyofeconomiesofscaleintheU.S.electricpowerindustry,Nerlove(1963)analyzedtheproductioncostsof145Americanelectricgeneratingcompanies.Thisstudy\nGreene-50240bookJune11,200218:46CHAPTER7✦FunctionalFormandStructuralChange125producedseveralinnovationsinmicroeconometrics.Itwasamongthefirstmajorapplicationsofstatisticalcostanalysis.ThetheoreticaldevelopmentinNerlove’sstudywasthefirsttoshowhowthefundamentaltheoryofdualitybetweenproductionandcostfunctionscouldbeusedtoframeaneconometricmodel.Finally,Nerloveemployedseveralusefultechniquestosharpenhisbasicmodel.Thefocusofthepaperwaseconomiesofscale,typicallymodeledasacharacteristicoftheproductionfunction.HechoseaCobb–Douglasfunctiontomodeloutputasafunctionofcapital,K,labor,L,andfuel,F;Q=αKαKLαLFαFeεi0whereQisoutputandεiembodiestheunmeasureddifferencesacrossfirms.Theeconomiesofscaleparameterisr=αK+αL+αF.Thevalueoneindicatesconstantreturnstoscale.Inthisstudy,Nerloveinvestigatedthewidelyacceptedassumptionthatproducersinthisindus-tryenjoyedsubstantialeconomiesofscale.Theproductionmodelisloglinear,soassumingthatotherconditionsoftheclassicalregressionmodelaremet,thefourparameterscouldbeestimatedbyleastsquares.However,hearguedthatthethreefactorscouldnotbetreatedasexogenousvariables.Forafirmthatoptimizesbychoosingitsfactorsofproduction,thedemandforfuelwouldbeF∗=F∗(Q,P,P,P)andlikewiseforlaborandcapital,soKLFcertainlytheassumptionsoftheclassicalmodelareviolated.Intheregulatoryframeworkinplaceatthetime,statecommissionssetratesandfirmsmetthedemandforthcomingattheregulatedprices.Thus,itwasarguedthatoutput(aswellasthefactorprices)couldbeviewedasexogenoustothefirmand,basedonanargumentbyZellner,Kmenta,andDreze(1964),Nerlovearguedthatatequilibrium,thedeviationofcostsfromthelongrunoptimumwouldbeindependentofoutput.(ThishasatestableimplicationwhichwewillexploreinChapter14.)Thus,thefirm’sobjectivewascostminimizationsubjecttotheconstraintoftheproductionfunction.ThiscanbeformulatedasaLagrangeanproblem,MinPK+PL+PF+λ(Q−αKαKLαLFαF).K,L,FKLF0Thesolutiontothisminimizationproblemisthethreefactordemandsandthemultiplier(whichmeasuresmarginalcost).Insertedbackintototalcosts,thisproducesan(intrinsicallylinear)loglinearcostfunction,PK+PL+PF=C(Q,P,P,P)=rAQ1/rPαK/rPαL/rPαF/reεi/rKLFKLFKLForlnC=β1+βqlnQ+βKlnPK+βLlnPL+βFlnPF+ui(7-7)whereβ=1/(α+α+α)isnowtheparameterofinterestandβ=α/r,j=K,L,F.3qKLFjjThus,thedualitybetweenproductionandcostfunctionshasbeenusedtoderivetheesti-matingequationfromfirstprinciples.Acomplicationremains.Thecostparametersmustsumtoone;βK+βL+βF=1,soestimationmustbedonesubjecttothisconstraint.4Thisrestrictioncanbeimposedbyregressingln(C/PF)onaconstantlnQ,ln(PK/PF)andln(PL/PF).ThisfirstsetofresultsappearsatthetopofTable7.3.3ReaderswhoattempttoreplicatetheoriginalstudyshouldnotethatNerloveusedcommon(base10)logsinhiscalculations,notnaturallogs.Thischangecreatessomenumericaldifferences.4Inthecontextoftheeconometricmodel,therestrictionhasatestableimplicationbythedefinitioninChapter6.But,theunderlyingeconomicsrequirethisrestriction—itwasusedinderivingthecostfunction.Thus,itisunclearwhatisimpliedbyatestoftherestriction.Presumably,ifthehypothesisoftherestrictionisrejected,theanalysisshouldstopatthatpoint,sincewithouttherestriction,thecostfunctionisnotavalidrepresentationoftheproductionfunction.WewillencounterthisconundrumagaininanotherforminChapter14.Fortunately,inthisinstance,thehypothesisisnotrejected.(ItisintheapplicationinChapter14.)\nGreene-50240bookJune11,200218:46126CHAPTER7✦FunctionalFormandStructuralChangeTABLE7.3Cobb–DouglasCostFunctions(StandardErrorsinParentheses)logQlogP−logPlogP−logPR2LFKFAllfirms0.7210.594−0.00850.952(0.0174)(0.205)(0.191)Group10.3980.641−0.0930.512Group20.6680.1050.3640.635Group30.9310.4080.2490.571Group40.9150.4720.1330.871Group51.0450.604−0.2950.920InitialestimatesoftheparametersofthecostfunctionareshowninthetoprowofTable7.3.Thehypothesisofconstantreturnstoscalecanbefirmlyrejected.Thetratiois(0.721−1)/0.0174=−16.03,soweconcludethatthisestimateissignificantlylessthanoneor,byimplication,rissignificantlygreaterthanone.Notethatthecoefficientonthecap-italpriceisnegative.Intheory,thisshouldequalαK/r,which(unlessthemarginalproductofcapitalisnegative),shouldbepositive.Nerloveattributedthistomeasurementerrorinthecapitalpricevariable.Thisseemsplausible,butitcarrieswithittheimplicationthattheothercoefficientsaremismeasuredaswell.[See(5-31a,b).ChristensenandGreene’s(1976)estimatorofthismodelwiththesedataproducedapositiveestimate.SeeSection14.3.1.]ThestrikingpatternoftheresidualsshowninFigure7.45andsomethoughtabouttheimpliedformoftheproductionfunctionsuggestedthatsomethingwasmissingfromthemodel.6Intheory,theestimatedmodelimpliesacontinuallydecliningaveragecostcurve,whichinturnimpliespersistenteconomiesofscaleatalllevelsofoutput.ThisconflictswiththetextbooknotionofaU-shapedaveragecostcurveandappearsimplausibleforthedata.Notethethreeclustersofresidualsinthefigure.Twoapproacheswereusedtoanalyzethemodel.Bysortingthesampleintofivegroupsonthebasisofoutputandfittingseparateregres-sionstoeachgroup,Nerlovefitapiecewiseloglinearmodel.TheresultsaregiveninthelowerrowsofTable7.3,wherethefirmsinthesuccessivegroupsareprogressivelylarger.Theresultsarepersuasivethatthe(log)-linearcostfunctionisinadequate.Theoutputcoef-ficientthatrisestowardandthencrosses1.0isconsistentwithaU-shapedcostcurveassurmisedearlier.Asecondapproachwastoexpandthecostfunctiontoincludeaquadraticterminlogoutput.ThisapproachcorrespondstoamuchmoregeneralmodelandproducedtheresultgiveninTable7.4.Again,asimpletteststronglysuggeststhatincreasedgeneralityiscalledfor;t=0.117/0.012=9.75.Theoutputelasticityinthisquadraticmodelisβ+2γlogQ.7qqqThereareeconomiesofscalewhenthisvalueislessthanoneandconstantreturnstoscalewhenitequalsone.Usingthetwovaluesgiveninthetable(0.151and0.117,respectively),wefindthatthisfunctiondoes,indeed,produceaUshapedaveragecostcurvewithminimumatlog10Q=(1−0.151)/(2×0.117)=3.628,orQ=4248,whichwasroughlyinthemiddleoftherangeofoutputsforNerlove’ssampleoffirms.5Theresidualsarecreatedasdeviationsofpredictedtotalcostfromactual,sotheydonotsumtozero.6ADurbin–Watsontestofcorrelationamongtheresiduals(seeSection12.5.1)revealedtotheauthorasubstantialautocorrelation.Althoughnormallyusedwithtimeseriesdata,theDurbin–Watsonstatisticandatestfor“autocorrelation”canbeausefultoolfordeterminingtheappropriatefunctionalforminacrosssectionalmodel.Tousethisapproach,itisnecessarytosorttheobservationsbasedonavariableofinterest(output).Severalclustersofresidualsofthesamesignsuggestedaneedtoreexaminetheassumedfunctionalform.7Nerloveinadvertentlymeasuredeconomiesofscalefromthisfunctionas1/(βq+δlogQ),whereβqandδarethecoefficientsonlogQandlog2Q.Thecorrectexpressionwouldhavebeen1/[∂logC/∂logQ]=1/[βq+2δlogQ].Thisslipwasperiodicallyrediscoveredinseverallaterpapers.\nGreene-50240bookJune11,200218:46CHAPTER7✦FunctionalFormandStructuralChange127ResidualsfromTotalCost2.01.51.0Residual.5.0.5024681012LOGQFIGURE7.4ResidualsfromPredictedCost.ThisstudywasupdatedbyChristensenandGreene(1976).Usingthesamedatabutamoreelaborate(translog)functionalformandbysimultaneouslyestimatingthefactorde-mandsandthecostfunction,theyfoundresultsbroadlysimilartoNerlove’s.TheirpreferredfunctionalformdidsuggestthatNerlove’sgeneralizedmodelinTable7.4didsomewhatun-derestimatetherangeofoutputsinwhichunitcostsofproductionwouldcontinuetodecline.Theyalsoredidthestudyusingasampleof123firmsfrom1970,andfoundsimilarresults.Inthelattersample,however,itappearedthatmanyfirmshadexpandedrapidlyenoughtoexhausttheavailableeconomiesofscale.Wewillrevisitthe1970datasetinastudyofefficiencyinSection17.6.4.Theprecedingexampleillustratesthreeusefultoolsinidentifyinganddealingwithunspecifiednonlinearity:analysisofresiduals,theuseofpiecewiselinearregression,andtheuseofpolynomialstoapproximatetheunknownregressionfunction.7.3.3INTRINSICLINEARITYANDIDENTIFICATIONTheloglinearmodelillustratesanintermediatecaseofanonlinearregressionmodel.Theequationisintrinsicallylinearbyourdefinition;bytakinglogsofY=αXβ2eεi,weiiobtainlnYi=lnα+β2lnXi+εi(7-8)TABLE7.4Log-QuadraticCostFunction(StandardErrorsinParentheses)logQlog2Qlog(P/P)log(P/P)R2LFKFAllfirms0.1510.1170.498−0.0620.95(0.062)(0.012)(0.161)(0.151)\nGreene-50240bookJune11,200218:46128CHAPTER7✦FunctionalFormandStructuralChangeoryi=β1+β2xi+εi.Althoughthisequationislinearinmostrespects,somethinghaschangedinthatitisnolongerlinearinα.Writtenintermsofβ1,weobtainafullylinearmodel.Butthatmaynotbetheformofinterest.Nothingislost,ofcourse,sinceβ1isjustlnα.Ifβ1canbeestimated,thenanobviousestimateofαissuggested.Thisfactleadsustoasecondaspectofintrinsicallylinearmodels.Maximumlike-lihoodestimatorshavean“invarianceproperty.”Intheclassicalnormalregressionmodel,themaximumlikelihoodestimatorofσisthesquarerootofthemaximumlike-lihoodestimatorofσ2.Undersomeconditions,leastsquaresestimatorshavethesameproperty.Byexploitingthis,wecanbroadenthedefinitionoflinearityandincludesomeadditionalcasesthatmightotherwisebequitecomplex.DEFINITION7.1IntrinsicLinearityIntheclassicallinearregressionmodel,iftheKparametersβ1,β2,...,βKcanbewrittenasKone-to-one,possiblynonlinearfunctionsofasetofKunderlyingparametersθ1,θ2,...,θK,thenthemodelisintrinsicallylinearinθ.Example7.4IntrinsicallyLinearRegressionInSection17.5.4,wewillestimatetheparametersofthemodel(β+x)−ρρ−1−y/(β+x)f(y|β,x)=ye(ρ)bymaximumlikelihood.Inthismodel,E[y|x]=(βρ)+ρx,whichsuggestsanotherwaythatwemightestimatethetwoparameters.Thisfunctionisanintrinsicallylinearregressionmodel,E[y|x]=β1+β2x,inwhichβ1=βρandβ2=ρ.Wecanestimatetheparametersbyleastsquaresandthenretrievetheestimateofβusingb1/b2.Sincethisvalueisanonlinearfunctionoftheestimatedparameters,weusethedeltamethodtoestimatethestandarderror.Usingthedatafromthatexample,theleastsquaresestimatesofβ1andβ2(withstandarderrorsinparentheses)are−4.1431(23.734)and2.4261(1.5915).Theestimatedcovarianceis−36.979.Theestimateofβis−4.1431/2.4261=−1.7077.Weestimatethesamplingvarianceofβˆwith22∂βˆ∂βˆ∂βˆ∂βˆEst.Var[βˆ]=Var[b1]+Var[b2]+2Cov[b1,b2]∂b1∂b2∂b1∂b22=8.6889.Table7.5comparestheleastsquaresandmaximumlikelihoodestimatesoftheparameters.Thelowerstandarderrorsforthemaximumlikelihoodestimatesresultfromtheinefficient(equal)weightinggiventotheobservationsbytheleastsquaresprocedure.Thegammadistributionishighlyskewed.Inaddition,weknowfromourresultsinAppendixCthatthisdistributionisanexponentialfamily.Wefoundforthegammadistributionthatthesufficientstatisticsforthisdensitywereiyiandilnyi.Theleastsquaresestimatordoesnotusethesecondofthese,whereasanefficientestimatorwill.\nGreene-50240bookJune11,200218:46CHAPTER7✦FunctionalFormandStructuralChange129TABLE7.5EstimatesoftheRegressioninaGammaModel:LeastSquaresversusMaximumLikelihoodβρEstimateStandardErrorEstimateStandardErrorLeastsquares−1.7088.6892.4261.592Maximumlikelihood−4.7192.4033.1510.663Theemphasisinintrinsiclinearityison“onetoone.”Iftheconditionsaremet,thenthemodelcanbeestimatedintermsofthefunctionsβ1,...,βK,andtheunderlyingparametersderivedaftertheseareestimated.Theone-to-onecorrespondenceisanidentificationcondition.Iftheconditionismet,thentheunderlyingparametersoftheregression(θ)aresaidtobeexactlyidentifiedintermsoftheparametersofthelinearmodelβ.AnexcellentexampleisprovidedbyKmenta(1986,p.515).Example7.5CESProductionFunctionTheconstantelasticityofsubstitutionproductionfunctionmaybewrittenν−ρ−ρlny=lnγ−ln[δK+(1−δ)L]+ε.(7-9)ρATaylorseriesapproximationtothisfunctionaroundthepointρ=0islny=lnγ+νδlnK+ν(1−δ)lnL+ρνδ(1−δ)−1[lnK−lnL]2+ε2=β1x1+β2x2+β3x3+β4x4+ε,(7-10)12wherex1=1,x2=lnK,x3=lnL,x4=−2ln(K/L),andthetransformationsareβ1=lnγ,β2=νδ,β3=ν(1−δ),β4=ρνδ(1−δ),(7-11)γ=eβ1,δ=β/(β+β),ν=β+β,ρ=β(β+β)/(ββ).2232342323Estimatesofβ1,β2,β3,andβ4canbecomputedbyleastsquares.Theestimatesofγ,δ,ν,andρobtainedbythesecondrowof(7-11)arethesameasthosewewouldobtainhadwefoundthenonlinearleastsquaresestimatesof(7-10)directly.AsKmentashows,however,theyarenotthesameasthenonlinearleastsquaresestimatesof(7-9)duetotheuseoftheTaylorseriesapproximationtogetto(7-10).Wewouldusethedeltamethodtoconstructtheestimatedasymptoticcovariancematrixfortheestimatesofθ=[γ,δ,ν,ρ].Thederivativesmatrixisβe1000∂θ0β2/(β+β)2−β/(β+β)2023223C==.∂β01100−βββ2β−ββββ2(β+β)/(ββ)342324232323TheestimatedcovariancematrixforθˆisCˆ[s2(XX)−1]Cˆ.Notallmodelsoftheformyi=β1(θ)xi1+β2(θ)xi2+···+βK(θ)xik+εi(7-12)areintrinsicallylinear.Recallthattheconditionthatthefunctionsbeonetoone(i.e.,thattheparametersbeexactlyidentified)wasrequired.Forexample,yi=α+βxi1+γxi2+βγxi3+εi\nGreene-50240bookJune11,200218:46130CHAPTER7✦FunctionalFormandStructuralChangeisnonlinear.Thereasonisthatifwewriteitintheformof(7-12),wefailtoaccountfortheconditionthatβ4equalsβ2β3,whichisanonlinearrestriction.Inthismodel,thethreeparametersα,β,andγareoveridentifiedintermsofthefourparametersβ1,β2,β3,andβ4.Unrestrictedleastsquaresestimatesofβ2,β3,andβ4canbeusedtoobtaintwoestimatesofeachoftheunderlyingparameters,andthereisnoassurancethatthesewillbethesame.7.4MODELINGANDTESTINGFORASTRUCTURALBREAKOneofthemorecommonapplicationsoftheFtestisintestsofstructuralchange.8Inspecifyingaregressionmodel,weassumethatitsassumptionsapplytoalltheobser-vationsinoursample.Itisstraightforward,however,totestthehypothesisthatsomeoforalltheregressioncoefficientsaredifferentindifferentsubsetsofthedata.Toanalyzeanumberofexamples,wewillrevisitthedataontheU.S.gasolinemarket9thatweexaminedinExample2.3.AsFigure7.5followingsuggests,thismarketbehavedinpredictable,unremarkablefashionpriortotheoilshockof1973andwasquitevolatilethereafter.Thelargejumpsinpricein1973and1980areclearlyvisible,asisthemuchgreatervariabilityinconsumption.Itseemsunlikelythatthesameregressionmodelwouldapplytobothperiods.7.4.1DIFFERENTPARAMETERVECTORSThegasolineconsumptiondataspantwoverydifferentperiods.Upto1973,fuelwasplentifulandworldpricesforgasolinehadbeenstableorfallingforatleasttwodecades.Theembargoof1973markedatransitioninthismarket(atleastforadecadeorso),markedbyshortages,risingprices,andintermittentturmoil.Itispossiblethattheen-tirerelationshipdescribedbyourregressionmodelchangedin1974.Totestthisasahypothesis,wecouldproceedasfollows:Denotethefirst14yearsofthedatainyandXasy1andX1andtheremainingyearsasy2andX2.Anunrestrictedregressionthatallowsthecoefficientstobedifferentinthetwoperiodsisy1X10β1ε1=+.(7-13)y20X2β2ε2DenotingthedatamatricesasyandX,wefindthattheunrestrictedleastsquaresestimatoris−1XX0Xyb−111111b=(XX)Xy==,(7-14)0XXXyb22222whichisleastsquaresappliedtothetwoequationsseparately.Therefore,thetotalsumofsquaredresidualsfromthisregressionwillbethesumofthetworesidualsumsof8ThistestisoftenlabeledaChowtest,inreferencetoChow(1960).9ThedataarelistedinAppendixTableA6.1.\nGreene-50240bookJune11,200218:46CHAPTER7✦FunctionalFormandStructuralChange1314.54.03.53.02.5PG2.01.51.0.5708090100110120GFIGURE7.5GasolinePriceandPerCapitaConsumption,1960–1995.squaresfromthetwoseparateregressions:ee=ee+ee.1122Therestrictedcoefficientvectorcanbeobtainedintwoways.Formally,therestrictionβ1=β2isRβ=q,whereR=[I:−I]andq=0.Thegeneralresultgivenearliercanbeapplieddirectly.Aneasierwaytoproceedistobuildtherestrictiondirectlyintothemodel.Ifthetwocoefficientvectorsarethesame,then(7-13)maybewritteny1X1ε1=β+,y2X2ε2andtherestrictedestimatorcanbeobtainedsimplybystackingthedataandestimatingasingleregression.Theresidualsumofsquaresfromthisrestrictedregression,ee∗∗thenformsthebasisforthetest.Theteststatisticisthengivenin(6-6),whereJ,thenumberofrestrictions,isthenumberofcolumnsinX2andthedenominatordegreesoffreedomisn1+n2−2k.7.4.2INSUFFICIENTOBSERVATIONSInsomecircumstances,thedataseriesarenotlongenoughtoestimateoneortheotheroftheseparateregressionsforatestofstructuralchange.Forexample,onemightsurmisethatconsumerstookayearortwotoadjusttotheturmoilofthetwooilpriceshocksin1973and1979,butthatthemarketneveractuallyfundamentallychangedorthatitonlychangedtemporarily.Wemightconsiderthesametestasbefore,butnowonlysingleoutthefouryears1974,1975,1980,and1981forspecialtreatment.Sincetherearesixcoefficientstoestimatebutonlyfourobservations,itisnotpossibletofit\nGreene-50240bookJune11,200218:46132CHAPTER7✦FunctionalFormandStructuralChangethetwoseparatemodels.Fisher(1970)hasshownthatinsuchacircumstance,avalidwaytoproceedisasfollows:1.Estimatetheregression,usingthefulldataset,andcomputetherestrictedsumofsquaredresiduals,ee.∗∗2.Usethelonger(adequate)subperiod(n1observations)toestimatetheregression,andcomputetheunrestrictedsumofsquares,ee.Thislattercomputationis11doneassumingthatwithonlyn20.a.Provethatthetheorynotwithstanding,theleastsquaresestimatescandc∗arerelatedby(y¯−y¯)(1−R2)1c∗=−c,(4)(1−P)1−r2yd\nGreene-50240bookJune11,200218:46146CHAPTER7✦FunctionalFormandStructuralChangewherey¯1=meanofyforobservationswithd=1,y¯=meanofyforallobservations,P=meanofd,R2=coefficientofdeterminationfor(1),r2=squaredcorrelationbetweenyandd.yd[Hint:Themodelcontainsaconstantterm.Thus,tosimplifythealgebra,assumethatallvariablesaremeasuredasdeviationsfromtheoverallsamplemeansanduseapartitionedregressiontocomputethecoefficientsin(3).Second,in(2),usetheresultthatbasedontheleastsquaresresultsy=ai+Xb+cd+e,soq=y−cd−e.Fromhereon,wedroptheconstantterm.Thus,intheregressionin(3)youareregressing[y−cd−e]onyandd.b.Willthesampleevidencenecessarilybeconsistentwiththetheory?[Hint:Sup-posethatc=0.]AsymposiumontheConwayandRobertspaperappearedintheJournalofBusinessandEconomicStatisticsinApril1983.5.Reverseregressioncontinued.ThisandthenextexercisecontinuetheanalysisofExercise4.InExercise4,interestcenteredonaparticulardummyvariableinwhichtheregressorswereaccuratelymeasured.Hereweconsiderthecaseinwhichthecrucialregressorinthemodelismeasuredwitherror.ThepaperbyKamlichandPolachek(1982)isdirectedtowardthisissue.Considerthesimpleerrorsinthevariablesmodel,y=α+βx∗+ε,x=x∗+u,whereuandεareuncorrelatedandxistheerroneouslymeasured,observedcoun-terparttox∗.a.Assumethatx∗,u,andεareallnormallydistributedwithmeansµ∗,0,and0,variancesσ2,σ2,andσ2,andzerocovariances.Obtaintheprobabilitylimitsof∗uεtheleastsquaresestimatorsofαandβ.b.Asanalternative,considerregressingxonaconstantandy,andthencomputingthereciprocaloftheestimate.Obtaintheprobabilitylimitofthisestimator.c.Dothe“direct”and“reverse”estimatorsboundthetruecoefficient?6.Reverseregressioncontinued.SupposethatthemodelinExercise5isextendedtoy=βx∗+γd+ε,x=x∗+u.Forconvenience,wedroptheconstantterm.Assumethatx∗,εanduareindependentnormallydistributedwithzeromeans.Supposethatdisarandomvariablethattakesthevaluesoneandzerowithprobabilitiesπand1−πinthepopulationandisindependentofallothervariablesinthemodel.Toputthisformulationincontext,theprecedingmodel(andvariantsofit)haveappearedintheliteratureondiscrimination.Weviewyasa“wage”variable,x∗as“qualifications,”andxassomeimperfectmeasuresuchaseducation.Thedummyvariabledismembership(d=1)ornonmembership(d=0)insomeprotectedclass.Thehypothesisofdiscriminationturnsonγ<0versusγ≥=0.a.Whatistheprobabilitylimitofc,theleastsquaresestimatorofγ,intheleastsquaresregressionofyonxandd?[Hints:Theindependenceofx∗anddisimportant.Also,plimdd/n=Var[d]+E2[d]=π(1−π)+π2=π.Thisminormodificationdoesnotaffectthemodelsubstantively,butitgreatlysimplifiesthe\nGreene-50240bookJune11,200218:46CHAPTER7✦FunctionalFormandStructuralChange147TABLE7.8ShipDamageIncidentsPeriodConstructedShipType1960–19641965–19691970–19741975–1979A041811B29534418C1121D00114E07121Source:DatafromMcCullaghandNelder(1983,p.137).algebra.]Nowsupposethatx∗anddarenotindependent.Inparticular,supposethatE[x∗|d=1]=µ1andE[x∗|d=0]=µ0.Repeatthederivationwiththisassumption.b.Consider,instead,aregressionofxonyandd.Whatistheprobabilitylimitofthecoefficientondinthisregression?Assumethatx∗anddareindependent.c.Supposethatx∗anddarenotindependent,butγis,infact,lessthanzero.Assumingthatbothprecedingequationsstillhold,whatisestimatedby(y¯|d=1)−(y¯|d=0)?Whatdoesthisquantityestimateifγdoesequalzero?7.Dataonthenumberofincidentsofdamagetoasampleofships,withthetypeofshipandtheperiodwhenitwasconstructed,aregivenintheTable7.8.Therearefivetypesofshipsandfourdifferentperiodsofconstruction.UseFtestsanddummyvariableregressionstotestthehypothesisthatthereisnosignificant“shiptypeeffect”intheexpectednumberofincidents.Now,usethesameproceduretotestwhetherthereisasignificant“periodeffect.”\nGreene-50240bookJune11,200218:498SPECIFICATIONANALYSISANDMODELSELECTIONQ8.1INTRODUCTIONChapter7presentedresultswhichwereprimarilyfocusedonsharpeningthefunctionalformofthemodel.Functionalformandhypothesistestingaredirectedtowardim-provingthespecificationofthemodelorusingthatmodeltodrawgenerallynarrowinferencesaboutthepopulation.Inthischapterweturntosomebroadertechniquesthatrelatetochoosingaspecificmodelwhenthereismorethanonecompetingcandidate.Section8.2describessomelargerissuesrelatedtotheuseofthemultipleregressionmodel—specificallytheimpactsofanincompleteorexcessivespecificationonestima-tionandinference.Sections8.3and8.4turntothebroadquestionofstatisticalmethodsforchoosingamongalternativemodels.8.2SPECIFICATIONANALYSISANDMODELBUILDINGOuranalysishasbeenbasedontheassumptionthatthecorrectspecificationoftheregressionmodelisknowntobey=Xβ+ε.(8-1)Therearenumeroustypesoferrorsthatonemightmakeinthespecificationoftheesti-matedequation.Perhapsthemostcommononesaretheomissionofrelevantvariablesandtheinclusionofsuperfluousvariables.8.2.1BIASCAUSEDBYOMISSIONOFRELEVANTVARIABLESSupposethatacorrectlyspecifiedregressionmodelwouldbey=X1β1+X2β2+ε,(8-2)wherethetwopartsofXhaveK1andK2columns,respectively.IfweregressyonX1withoutincludingX2,thentheestimatorisb=(XX)−1Xy=β+(XX)−1XXβ+(XX)−1Xε.(8-3)1111111122111Takingtheexpectation,weseethatunlessXX=0orβ=0,bisbiased.Thewell-1221knownresultistheomittedvariableformula:E[b1|X]=β1+P1.2β2,(8-4)148\nGreene-50240bookJune11,200218:49CHAPTER8✦SpecificationAnalysisandModelSelection149whereP=(XX)−1XX.(8-5)1.21112EachcolumnoftheK1×K2matrixP1.2isthecolumnofslopesintheleastsquaresregressionofthecorrespondingcolumnofX2onthecolumnsofX1.Example8.1OmittedVariablesIfademandequationisestimatedwithouttherelevantincomevariable,then(8-4)showshowtheestimatedpriceelasticitywillbebiased.Lettingbbetheestimator,weobtainCov[price,income]E[b|price,income]=β+γ,Var[price]whereγistheincomecoefficient.Inaggregatedata,itisunclearwhetherthemissingco-variancewouldbepositiveornegative.Thesignofthebiasinbwouldbethesameasthiscovariance,however,becauseVar[price]andγwouldbepositive.ThegasolinemarketdatawehaveexaminedinExamples2.3and7.6provideastrikingexample.Figure7.5showedasimpleplotofpercapitagasolineconsumption,G/popagainstthepriceindexPG.Theplotisconsiderablyatoddswithwhatonemightexpect.ButalookatthedatainAppendixTableF2.2showsclearlywhatisatwork.Holdingpercapitaincome,I/popandotherpricesconstant,thesedatamightwellconformtoexpectations.Inthesedata,however,incomeispersistentlygrowing,andthesimplecorrelationsbetweenG/popandI/popandbetweenPGandI/popare0.86and0.58,respectively,whicharequitelarge.Toseeiftheexpectedrelationshipbetweenpriceandconsumptionshowsup,wewillhavetopurgeourdataoftheinterveningeffectofI/pop.Todoso,werelyontheFrisch–WaughresultinTheorem3.3.TheregressionresultsappearinTable7.6.Thefirstcolumnshowsthefullregressionmodel,withlnPG,logIncome,andseveralothervariables.Theestimateddemandelasticityis−0.11553,whichconformswithexpectations.Ifincomeisomittedfromthisequation,theestimatedpriceelasticityis+0.074499whichhasthewrongsign,butiswhatwewouldexpectgiventhetheoreticalresultsabove.Inthisdevelopment,itisstraightforwardtodeducethedirectionsofbiaswhenthereisasingleincludedvariableandoneomittedvariable.Itisimportanttonote,however,thatifmorethanonevariableisincluded,thenthetermsintheomittedvariableformulainvolvemultipleregressioncoefficients,whichthemselveshavethesignsofpartial,notsimple,correlations.Forexample,inthedemandequationofthepreviousexample,ifthepriceofacloselyrelatedproducthadbeenincludedaswell,thenthesimplecorrelationbetweenpriceandincomewouldbeinsufficienttodeterminethedirectionofthebiasinthepriceelasticity.Whatwouldberequiredisthesignofthecorrelationbetweenpriceandincomenetoftheeffectoftheotherprice.Thisrequirementmightnotbeobvious,anditwouldbecomeevenlesssoasmoreregressorswereaddedtotheequation.8.2.2PRETESTESTIMATIONThevarianceofb1isthatofthethirdtermin(8-3),whichisVar[b|X]=σ2(XX)−1.(8-6)111Ifwehadcomputedthecorrectregression,includingX2,thentheslopesonX1wouldhavebeenunbiasedandwouldhavehadacovariancematrixequaltotheupperleftblockofσ2(XX)−1.ThismatrixisVar[b|X]=σ2(XMX)−1,(8-7)1.2121\nGreene-50240bookJune11,200218:49150CHAPTER8✦SpecificationAnalysisandModelSelectionwhereM=I−X(XX)−1X,22222orVar[b|X]=σ2[XX−XX(XX)−1XX]−1.1.211122221Wecancomparethecovariancematricesofb1andb1.2moreeasilybycomparingtheirinverses[seeresult(A-120)];Var[b|X]−1−Var[b|X]−1=(1/σ2)XX(XX)−1XX,11.2122221whichisnonnegativedefinite.Weconcludethatalthoughb1isbiased,itsvarianceisneverlargerthanthatofb1.2(sincetheinverseofitsvarianceisatleastaslarge).Suppose,forinstance,thatX1andX2areeachasinglecolumnandthatthevariablesaremeasuredasdeviationsfromtheirrespectivemeans.Thenσ2nVar[b|X]=,wheres=(x−x¯)2,111i11s11i=1whereasσ2Var[b1.2|X]=σ2[xx1−xx2(xx2)−1xx1]−1=,(8-8)1122s1−r21112where(xx)2212r12=x1x1x2x2isthesquaredsamplecorrelationbetweenx1andx2.Themorehighlycorrelatedx1andx2are,thelargeristhevarianceofb1.2comparedwiththatofb1.Therefore,itispossiblethatb1isamorepreciseestimatorbasedonthemean-squarederrorcriterion.Theresultintheprecedingparagraphposesabitofadilemmaforappliedre-searchers.Thesituationarisesfrequentlyinthesearchforamodelspecification.Facedwithavariablethataresearchersuspectsshouldbeintheirmodel,butwhichiscausingaproblemofcollinearity,theanalystfacesachoiceofomittingtherelevantvariableorincludingitandestimatingits(andalltheothervariables’)coefficientimprecisely.Thispresentsachoicebetweentwoestimators,b1andb1.2.Infact,whatresearchersusuallydoactuallycreatesathirdestimator.Itiscommontoincludetheproblemvariablepro-visionally.Ifitstratioissufficientlylarge,itisretained;otherwiseitisdiscarded.Thisthirdestimatoriscalledapretestestimator.Whatisknownaboutpretestestimatorsisnotencouraging.Certainlytheyarebiased.Howbadlydependsontheunknownpa-rameters.Analyticalresultssuggestthatthepretestestimatoristheleastpreciseofthethreewhentheresearcherismostlikelytouseit.[SeeJudgeetal.(1985).]8.2.3INCLUSIONOFIRRELEVANTVARIABLESIftheregressionmodeliscorrectlygivenbyy=X1β1+ε(8-9)\nGreene-50240bookJune11,200218:49CHAPTER8✦SpecificationAnalysisandModelSelection151andweestimateitasif(8-2)werecorrect(i.e.,weincludesomeextravariables),thenitmightseemthatthesamesortsofproblemsconsideredearlierwouldarise.Infact,thiscaseisnottrue.Wecanviewtheomissionofasetofrelevantvariablesasequivalenttoimposinganincorrectrestrictionon(8-2).Inparticular,omittingX2isequivalenttoin-correctlyestimating(8-2)subjecttotherestrictionβ2=0.Aswediscovered,incorrectlyimposingarestrictionproducesabiasedestimator.Anotherwaytoviewthiserroristonotethatitamountstoincorporatingincorrectinformationinourestimation.Suppose,however,thatourerrorissimplyafailuretousesomeinformationthatiscorrect.TheinclusionoftheirrelevantvariablesX2intheregressionisequivalenttofailingtoimposeβ2=0on(8-2)inestimation.But(8-2)isnotincorrect;itsimplyfailstoincorporateβ2=0.Therefore,wedonotneedtoproveformallythattheleastsquaresestimatorofβin(8-2)isunbiasedevengiventherestriction;wehavealreadyprovedit.Wecanassertonthebasisofallourearlierresultsthatβ1β1E[b|X]==.(8-10)β20Bythesamereasoning,s2isalsounbiased:eeEX=σ2.(8-11)n−K1−K2Thenwhereistheproblem?Itwouldseemthatonewouldgenerallywantto“overfit”themodel.Fromatheoreticalstandpoint,thedifficultywiththisviewisthatthefailuretousecorrectinformationisalwayscostly.Inthisinstance,thecostisthereducedpre-cisionoftheestimates.Aswehaveshown,thecovariancematrixintheshortregression(omittingX2)isneverlargerthanthecovariancematrixfortheestimatorobtainedinthepresenceofthesuperfluousvariables.1Consideragainthesingle-variablecompar-isongivenearlier.Ifx2ishighlycorrelatedwithx1,thenincorrectlyincludingitintheregressionwillgreatlyinflatethevarianceoftheestimator.8.2.4MODELBUILDING—AGENERALTOSIMPLESTRATEGYTherehasbeenashiftinthegeneralapproachtomodelbuildinginthelast20yearsorso,partlybasedontheresultsintheprevioustwosections.Withaneyetowardmaintainingsimplicity,modelbuilderswouldgenerallybeginwithasmallspecificationandgraduallybuildupthemodelultimatelyofinterestbyaddingvariables.But,basedontheprecedingresults,wecansurmisethatjustaboutanycriterionthatwouldbeusedtodecidewhethertoaddavariabletoacurrentspecificationwouldbetaintedbythebiasescausedbytheincompletespecificationattheearlysteps.Omittingvariablesfromtheequationseemsgenerallytobetheworseofthetwoerrors.Thus,thesimple-to-generalapproachtomodelbuildinghaslittletorecommendit.BuildingontheworkofHendry[e.g.,(1995)]andaidedbyadvancesinestimationhardwareandsoftware,researchersarenowmorecomfortablebeginningtheirspecificationsearcheswithlargeelaboratemodels1ThereisnolossifXX2=0,whichmakessenseintermsoftheinformationaboutX1containedinX2(here,1none).Thissituationisnotlikelytooccurinpractice,however.\nGreene-50240bookJune11,200218:49152CHAPTER8✦SpecificationAnalysisandModelSelectioninvolvingmanyvariablesandperhapslongandcomplexlagstructures.Theattractivestrategyisthentoadoptageneral-to-simple,downwardreductionofthemodeltothepreferredspecification.Ofcourse,thismustbetemperedbytworelatedconsiderations.Inthe“kitchensink”regression,whichcontainseveryvariablethatmightconceivablyberelevant,theadoptionofafixedprobabilityforthetypeIerror,say5percentassuresthatinabigenoughmodel,somevariableswillappeartobesignificant,evenif“byaccident.”Second,theproblemsofpretestestimationandstepwisemodelbuildingalsoposesomeriskofultimatelymisspecifyingthemodel.Tociteoneunfortunatelycommonexample,thestatisticsinvolvedoftenproduceunexplainablelagstructuresindynamicmodelswithmanylagsofthedependentorindependentvariables.8.3CHOOSINGBETWEENNONNESTEDMODELSTheclassicaltestingproceduresthatwehavebeenusinghavebeenshowntobemostpowerfulforthetypesofhypotheseswehaveconsidered.2Althoughuseofthesepro-ceduresisclearlydesirable,therequirementthatweexpressthehypothesesintheformofrestrictionsonthemodely=Xβ+ε,H0:Rβ=qversusH1:Rβ=q,canbelimiting.Twocommonexceptionsarethegeneralproblemofdeterminingwhichoftwopossiblesetsofregressorsismoreappropriateandwhetheralinearorloglinearmodelismoreappropriateforagivenanalysis.Forthepresent,weareinterestedincomparingtwocompetinglinearmodels:H0:y=Xβ+ε0(8-12a)andH1:y=Zγ+ε1.(8-12b)Theclassicalprocedureswehaveconsideredthusfarprovidenomeansofformingapreferenceforonemodelortheother.Thegeneralproblemoftestingnonnestedhy-pothesessuchasthesehasattractedanimpressiveamountofattentioninthetheoreticalliteratureandhasappearedinawidevarietyofempiricalapplications.3Beforeturningtoclassical-(frequentist-)basedtestsinthissetting,weshouldnotethattheBayesianapproachtothisquestionmightbemoreintellectuallyappealing.Ourprocedureswillcontinuetobedirectedtowardanobjectiveofrejectingonemodelinfavoroftheother.Yet,infact,ifwehavedoubtsastowhichoftwomodelsisappropriate,thenwemightwellbeconvincedtoconcedethatpossiblyneitheroneisreally“thetruth.”Wehaveratherpaintedourselvesintoacornerwithour“leftorright”2See,forexample,StuartandOrd(1989,Chap.27).3RecentsurveysonthissubjectareWhite(1982a,1983),GourierouxandMonfort(1994),McAleer(1995),andPesaranandWeeks(2001).McAleer’ssurveytabulatesanarrayofapplications,whileGourierouxandMonfortfocusontheunderlyingtheory.\nGreene-50240bookJune11,200218:49CHAPTER8✦SpecificationAnalysisandModelSelection153approach.TheBayesianapproachtothisquestiontreatsitasaproblemofcomparingthetwohypothesesratherthantestingforthevalidityofoneovertheother.Weenteroursamplingexperimentwithasetofpriorprobabilitiesabouttherelativemeritsofthetwohypotheses,whichissummarizedina“prioroddsratio,”P01=Prob[H0]/Prob[H1].Aftergatheringourdata,weconstructtheBayesfactor,whichsummarizestheweightofthesampleevidenceinfavorofonemodelortheother.Afterthedatahavebeenanalyzed,wehaveour“posterioroddsratio,”P01|data=Bayesfactor×P01.Theupshotisthatexpost,neithermodelisdiscarded;wehavemerelyrevisedourassessmentofthecomparativelikelihoodofthetwointhefaceofthesampledata.SomeoftheformalitiesofthisapproacharediscussedinChapter16.8.3.1TESTINGNONNESTEDHYPOTHESESAusefuldistinctionbetweenhypothesistestingasdiscussedintheprecedingchaptersandmodelselectionasconsideredherewillturnontheasymmetrybetweenthenullandalternativehypothesesthatisapartoftheclassicaltestingprocedure.4Since,byconstruction,theclassicalproceduresseekevidenceinthesampletorefutethe“null”hypothesis,howoneframesthenullcanbecrucialtotheoutcome.Fortunately,theNeyman-Pearsonmethodologyprovidesaprescription;thenullisusuallycastasthenarrowestmodelinthesetunderconsideration.Ontheotherhand,theclassicalpro-ceduresneverreachasharpconclusion.Unlessthesignificancelevelofthetestingprocedureismadesohighastoexcludeallalternatives,therewillalwaysremainthepossibilityofatypeoneerror.Assuch,thenullisneverrejectedwithcertainty,butonlywithaprespecifieddegreeofconfidence.Modelselectiontests,incontrast,givethecompetinghypothesesequalstanding.Thereisnonaturalnullhypothesis.However,theendoftheprocessisafirmdecision—intesting(8-12a,b),oneofthemodelswillberejectedandtheotherwillberetained;theanalysiswillthenproceedintheframeworkofthatonemodelandnottheother.Indeed,itcannotproceeduntiloneofthemodelsisdiscarded.Itiscommon,forexample,inthisnewsettingfortheanalystfirsttotestwithonemodelcastasthenull,thenwiththeother.Unfortunately,giventhewaythetestsareconstructed,itcanhappenthatbothorneithermodelisrejected;ineithercase,furtheranalysisisclearlywarranted.Asweshallsee,thescienceisabitinexact.Theearliestworkonnonnestedhypothesistesting,notablyCox(1961,1962),wasdoneintheframeworkofsamplelikelihoodsandmaximumlikelihoodprocedures.Recentdevelopmentshavebeenstructuredaroundacommonpillarlabeledtheen-compassingprinciple[MizonandRichard(1986)].Inthelarge,theprincipledirectsattentiontothequestionofwhetheramaintainedmodelcanexplainthefeaturesofitscompetitors,thatis,whetherthemaintainedmodelencompassesthealternative.Yetathirdapproachisbasedonformingacomprehensivemodelwhichcontainsbothcompetitorsasspecialcases.Whenpossible,thetestbetweenmodelscanbebased,essentially,onclassical(-like)testingprocedures.Wewillexamineteststhatexemplifyallthreeapproaches.4SeeGrangerandPesaran(2000)fordiscussion.\nGreene-50240bookJune11,200218:49154CHAPTER8✦SpecificationAnalysisandModelSelection8.3.2ANENCOMPASSINGMODELTheencompassingapproachisoneinwhichtheabilityofonemodeltoexplainfeaturesofanotheristested.Model0“encompasses”Model1ifthefeaturesofModel1canbeexplainedbyModel0butthereverseisnottrue.5SinceHcannotbewrittenasa0restrictiononH1,noneoftheprocedureswehaveconsideredthusfarisappropriate.Onepossibilityisanartificialnestingofthetwomodels.LetX¯bethesetofvariablesinXthatarenotinZ,defineZ¯likewisewithrespecttoX,andletWbethevariablesthatthemodelshaveincommon.ThenH0andH1couldbecombinedina“supermodel”:y=X¯β¯+Z¯γ¯+Wδ+ε.Inprinciple,H1isrejectedifitisfoundthatγ¯=0byaconventionalFtest,whereasH0isrejectedifitisfoundthatβ¯=0.Therearetwoproblemswiththisapproach.First,δremainsamixtureofpartsofβandγ,anditisnotestablishedbytheFtestthateitherofthesepartsiszero.Hence,thistestdoesnotreallydistinguishbetweenH0andH1;itdistinguishesbetweenH1andahybridmodel.Second,thiscompoundmodelmayhaveanextremelylargenumberofregressors.Inatime-seriessetting,theproblemofcollinearitymaybesevere.Consideranalternativeapproach.IfH0iscorrect,thenywill,apartfromtheran-domdisturbanceε,befullyexplainedbyX.SupposewethenattempttoestimateγbyregressionofyonZ.Whateversetofparametersisestimatedbythisregression,sayc,ifH0iscorrect,thenweshouldestimateexactlythesamecoefficientvectorifweweretoregressXβonZ,sinceε0israndomnoiseunderH0.Sinceβmustbeestimated,supposethatweuseXbinsteadandcomputec0.AtestofthepropositionthatModel0“encompasses”Model1wouldbeatestofthehypothesisthatE[c−c0]=0.Itisstraightforwardtoshow[seeDavidsonandMacKinnon(1993,pp.384–387)]thatthetestcanbecarriedoutbyusingastandardFtesttotestthehypothesisthatγ1=0intheaugmentedregression,y=Xβ+Z1γ1+ε1,whereZ1isthevariablesinZthatarenotinX.8.3.3COMPREHENSIVEAPPROACH—THEJTESTTheunderpinningsofthecomprehensiveapproacharetiedtothedensityfunctionasthecharacterizationofthedatageneratingprocess.Letf0(yi|data,β0)betheassumeddensityunderModel0anddefinethealternativelikewiseasf1(yi|data,β1).Then,acomprehensivemodelwhichsubsumesbothoftheseis[f(y|data,β)]1−λ[f(y|data,β)]λ0i01i1fc(yi|data,β0,β1)=[f(y|data,β)]1−λ[f(y|data,β)]λdy.rangeofyi0i01i1iEstimationofthecomprehensivemodelfollowedbyatestofλ=0or1isusedtoassessthevalidityofModel0or1,respectively.65SeeDeaton(1982),Dastoor(1983),Gourieroux,etal.(1983,1995)and,especially,MizonandRichard(1986).6SeeSection21.4.4cforanapplicationtothechoiceofprobitorlogitmodelforbinarychoicesuggestedbySilva(2001).\nGreene-50240bookJune11,200218:49CHAPTER8✦SpecificationAnalysisandModelSelection155TheJtestproposedbyDavidsonandMacKinnon(1981)canbeshown[seePesaranandWeeks(2001)]tobeanapplicationofthisprincipletothelinearregressionmodel.Theirsuggestedalternativetotheprecedingcompoundmodelisy=(1−λ)Xβ+λ(Zγ)+ε.Inthismodel,atestofλ=0wouldbeatestagainstH1.Theproblemisthatλcannotbeseparatelyestimatedinthismodel;itwouldamounttoaredundantscalingoftheregressioncoefficients.DavidsonandMacKinnon’sJtestconsistsofestimatingγbyaleastsquaresregressionofyonZfollowedbyaleastsquaresregressionofyonXandZγˆ,thefittedvaluesinthefirstregression.Avalidtest,atleastasymptotically,ofH1istotestH0:λ=0.IfH0istrue,thenplimλˆ=0.Asymptotically,theratioλ/ˆse(λ)ˆ(i.e.,theusualtratio)isdistributedasstandardnormalandmaybereferredtothestandardtabletocarryoutthetest.Unfortunately,intestingH0versusH1andviceversa,allfourpossibilities(rejectboth,neither,oreitheroneofthetwohypotheses)couldoccur.Thisissue,however,isafinitesampleproblem.DavidsonandMacKinnonshowthatasn→∞,ifH1istrue,thentheprobabilitythatλˆwilldiffersignificantlyfromzeroapproaches1.Example8.2JTestforaConsumptionFunctionGaverandGeisel(1974)proposetwoformsofaconsumptionfunction:H0:Ct=β1+β2Yt+β3Yt−1+ε0tandH1:Ct=γ1+γ2Yt+γ3Ct−1+ε1t.Thefirstmodelstatesthatconsumptionrespondstochangesinincomeovertwoperiods,whereasthesecondstatesthattheeffectsofchangesinincomeonconsumptionpersistformanyperiods.QuarterlydataonaggregateU.S.realconsumptionandrealdisposableincomearegiveninTableF5.1.HereweapplytheJtesttothesedataandthetwoproposedspecifications.First,thetwomodelsareestimatedseparately(usingobservations1950.2–2000.4).TheleastsquaresregressionofConaconstant,Y,laggedY,andthefittedvaluesfromthesecondmodelproducesanestimateofλof1.0145withatratioof62.861.Thus,H0shouldberejectedinfavorofH1.ButreversingtherolesofH0andH1,weobtainanestimateofλof−10.677withatratioof−7.188.Thus,Hisrejectedaswell.718.3.4THECOXTEST8Likelihoodratiotestsrelyonthreefeaturesofthedensityoftherandomvariableofinterest.First,underthenullhypothesis,theaveragelogdensityofthenullhypothesiswillbelessthanunderthealternative—thisisaconsequenceofthefactthatthenullmodelisnestedwithinthealternative.Second,thedegreesoffreedomforthechi-squaredstatisticisthereductioninthedimensionoftheparameterspacethatisspecifiedbythenullhypothesis,comparedtothealternative.Third,inordertocarryoutthetest,underthenullhypothesis,theteststatisticmusthaveaknowndistributionwhichisfreeofthemodelparametersunderthealternativehypothesis.Whenthemodelsare7Forrelateddiscussionofthispossibility,seeMcAleer,Fisher,andVolker(1982).8TheCoxtestisbaseduponthelikelihoodratiostatistic,whichwillbedevelopedinChapter17.Theresultsforthelinearregressionmodel,however,arebasedonsumsofsquaredresiduals,andtherefore,relyonnothingmorethanleastsquares,whichisalreadyfamiliar.\nGreene-50240bookJune11,200218:49156CHAPTER8✦SpecificationAnalysisandModelSelectionnonnested,noneoftheserequirementswillbemet.Thefirstneednotholdatall.Withregardtothesecond,theparameterspaceunderthenullmodelmaywellbelargerthan(or,atleastthesamesize)asunderthealternative.(Merelyreversingthetwomodelsdoesnotsolvethisproblem.Thetestmustbeabletoworkinbothdirections.)Finally,becauseofthesymmetryofthenullandalternativehypotheses,thedistributionsoflikelihoodbasedteststatisticswillgenerallybefunctionsoftheparametersofthealternativemodel.Cox’s(1961,1962)analysisofthisproblemproducedareformulatedteststatisticthatisbasedonthestandardnormaldistributionandiscenteredatzero.9VersionsoftheCoxtestappropriateforthelinearandnonlinearregressionmodelshavebeenderivedbyPesaran(1974)andPesaranandDeaton(1978).ThelatterpresentateststatisticfortestinglinearversusloglinearmodelsthatisextendedinAneuryn-EvansandDeaton(1980).Sinceintheclassicalregressionmodeltheleastsquaresestimatorisalsothemaximumlikelihoodestimator,itisperhapsnotsurprisingthatDavidsonandMacKinnon(1981,p.789)findthattheirteststatisticisasymptoticallyequaltothenegativeoftheCox–PesaranandDeatonstatistic.TheCoxstatisticfortestingthehypothesisthatXisthecorrectsetofregressorsandthatZisnotisns2ns2ZZc01=ln=ln,(8-13)2s2+(1/n)bXMZXb2s2XZXwhereM=I−Z(ZZ)−1Z,ZM=I−X(XX)−1X,Xb=(XX)−1Xy.s2=ee/n=mean-squaredresidualintheregressionofyonZ,ZZZs2=ee/n=mean-squaredresidualintheregressionofyonX,XXXs2=s2+bXMXb/n.ZXXZThehypothesisistestedbycomparingc01c01q=1/2=(8-14)Est.Var[c01]s2XbXMMMXb4ZXZsZXtothecriticalvaluefromthestandardnormaltable.Alargevalueofqisevidenceagainstthenullhypothesis(H0).TheCoxtestappearstoinvolveanimpressiveamountofmatrixalgebra.Butthealgebraicresultsaredeceptive.Oneneedsonlytocomputelinearregressionsandre-trievefittedvaluesandsumsofsquaredresiduals.Thefollowingdoesthefirsttest.TherolesofXandZarereversedforthesecond.1.RegressyonXtoobtainbandyˆ=Xb,e=y−Xb,s2=ee/n.XXXXX2.RegressyonZtoobtaindandyˆ=Zd,e=y−Zd,s2=ee/n.ZZZZZ9SeePesaranandWeeks(2001)forsomeoftheformalitiesoftheseresults.\nGreene-50240bookJune11,200218:49CHAPTER8✦SpecificationAnalysisandModelSelection1573.RegressyˆonZtoobtaindande=yˆ−Zd=MXb,ee=XXZ.XXXZZ.XZ.XbXMXb.Z4.RegresseonXandcomputeresidualse,ee=bXMMMXb.Z.XX.ZXX.ZXX.ZXZXZ5.Computes2=s2+ee/n.ZXXZ.XZ.Xs2s2(ee)nZXX.ZXX.ZX√c016.Computec01=2logs2,v01=s4,q=v01.ZXZXTherefore,theCoxstatisticcanbecomputedsimplybycomputingaseriesofleastsquaresregressions.Example8.3CoxTestforaConsumptionFunctionWecontinuethepreviousexamplebyapplyingtheCoxtesttothedataofExample8.2.Forpurposesofthetest,letX=[iyy−1]andZ=[iyc−1].Usingthenotationof(8-13)and(8-14),wefindthats2=7,556.657,Xs2=456.3751,ZbXMZXb=167.50707,bXMZMXMZXb=2.61944,s2=7556.657+167.50707/203=7,557.483.ZXThus,203456.3751c01=ln=−284.90827,557.483and7,556.657(2.61944)Est.Var[c01]==0.00034656.7,557.4832Thus,q=−15,304.281.Onthisbasis,werejectthehypothesisthatXisthecorrectsetofregressors.Noteinthepreviousexamplethatwereachedthesameconclusionbasedonatratioof62.861.Asexpected,theresulthastheoppositesignfromthecorrespondingJstatisticinthepreviousexample.NowwereversetherolesofXandZinourcalculations.LettingddenotetheleastsquarescoefficientsintheregressionofconsumptiononZ,wefindthatdZMXZd=1,418,985.185,dZMXMZMXZd=22,189.811,s2=456.3751+1,418,985.185/203=7446.4499.XZThus,2037,556.657c10=ln=1.49127,446.4499and456.3751(22,189.811)Est.Var[c10]==0.18263.7,446.44992Thiscomputationproducesavalueofq=3.489,whichisroughlyequal(inabsolutevalue)thanitscounterpartinExample8.2,−7.188.Since1.594islessthanthe5percentcriticalvalueofto−1.96,weonceagainrejectthehypothesisthatZisthepreferredsetofregressorsthoughtheresultsdostronglyfavorZinqualitativeterms.\nGreene-50240bookJune11,200218:49158CHAPTER8✦SpecificationAnalysisandModelSelectionPesaranandHall(1988)haveextendedtheCoxtesttotestingwhichoftwonon-nestedrestrictedregressionsispreferred.ThemodelingframeworkisH:y=Xβ+ε,Var[ε|X]=σ2I,subjecttoRβ=q0000000000H:y=Xβ+ε,Var[ε|X]=σ2I,subjecttoRβ=q.0111111111Likeitscounterpartforunrestrictedregressions,thisCoxtestrequiresalargeamountofmatrixalgebra.However,onceagain,itreducestoasequenceofregressions,thoughthistimewithsomeunavoidablematrixmanipulationremaining.LetG=(XX)−1−(XX)−1R[R(XX)−1R]−1R(XX)−1,i=0,1,iiiiiiiiiiiiiandT=XGX,m=rank(R),k=rank(X),h=k−mandd=n−hwhereniiiiiiiiiiiiiisthesamplesize.Thefollowingstepsproducetheneededstatistics:1.Computeei=theresidualsfromtherestrictedregression,i=0,1.2.Computee10bycomputingtheresidualsfromtherestrictedregressionofy−e0onX1.Computee01likewisebyreversingthesubscripts.3.Computee100astheresidualsfromtherestrictedregressionofy−e10onX0ande110likewisebyreversingthesubscripts.Letvi,vijandvijkdenotethesumsofsquaredresidualsinSteps1,2,and3andlets2=ee/d.iiii22224.Computetrace(B0)=h1−trace[(T0T1)]−h1−trace[(T0T1)](n−h0)andtrace(B2)likewisebyreversingsubscripts.15.Computes2=v+s2trace[I−T−T+TT]ands2likewise.10100010101Theauthorsproposeseveralstatistics.AWaldtestbasedonGodfreyandPesaran(1983)isbasedonthedifferencebetweenanestimatorofσ2andtheprobabilitylimitofthis1estimatorassumingthatH0istrue√W0=n(v1−v0−v10)4v0v100.UnderthenullhypothesisofModel0,thelimitingdistributionofW0isstandardnormal.AnalternativestatisticbasedonCox’slikelihoodapproachis2N=(n/2)lns2/s24vs2/s2.0110100010Example8.4CoxTestforRestrictedRegressionsTheexampletheysuggestistwocompetingmodelsforexpectedinflation,Pe,basedontcommonlyusedlagstructuresinvolvinglagsofPeandcurrentlaggedvaluesofactualinfla-ttion,Pt;(Regressive):Pe=P+θ(P−P)+θ(P−P)+εtt1tt−12t−1t−20t(Adaptive)Pe=Pe+λP−Pe+λP−Pe+ε.tt−11tt−12t−1t−21tByformulatingthesemodelsaseeyt=β1Pt−1+β2Pt−2+β3Pt+β4Pt−1+β5Pt−2+εt,\nGreene-50240bookJune11,200218:49CHAPTER8✦SpecificationAnalysisandModelSelection159TheyshowthatthehypothesesareH0:β1=β2=0,β3+β4+β5=1H1:β1+β3=1,β2+β4=0,β5=0.PesaranandHall’sanalysiswasbasedonquarterlydataforBritishmanufacturingfrom1972to1981.ThedataappearintheAppendixtoPesaran(1987)andarereproducedinTableF8.1.Usingtheirdata,thecomputationslistedbeforeproducethefollowingresults:W0:NullisH0;−3.887,NullisH1;−0.134N0:NullisH0;−2.437,NullisH1;−0.032.TheseresultsfairlystronglysupportModel1andleadtorejectionofModel0.108.4MODELSELECTIONCRITERIATheprecedingdiscussionsuggestedsomeapproachestomodelselectionbasedonnonnestedhypothesistests.Fitmeasuresandtestingproceduresbasedonthesumofsquaredresiduals,suchasR2andtheCoxtest,areusefulwheninterestcentersonthewithin-samplefitorwithin-samplepredictionofthedependentvariable.Whenthemodelbuildingisdirectedtowardforecasting,within-samplemeasuresarenotneces-sarilyoptimal.Aswehaveseen,R2cannotfallwhenvariablesareaddedtoamodel,sothereisabuilt-intendencytooverfitthemodel.Thiscriterionmaypointusawayfromthebestforecastingmodel,becauseaddingvariablestoamodelmayincreasethevarianceoftheforecasterror(seeSection6.6)despitetheimprovedfittothedata.Withthisthoughtinmind,theadjustedR2,n−1n−1eeR¯2=1−(1−R2)=1−,(8-15)n−Kn−Kn(y−y¯)2i=1ihasbeensuggestedasafitmeasurethatappropriatelypenalizesthelossofdegreesoffreedomthatresultfromaddingvariablestothemodel.NotethatR¯2mayfallwhenavariableisaddedtoamodelifthesumofsquaresdoesnotfallfastenough.(TheapplicableresultappearsinTheorem3.7;R¯2doesnotrisewhenavariableisaddedtoamodelunlessthetratioassociatedwiththatvariableexceedsoneinabsolutevalue.)TheadjustedR2hasbeenfoundtobeapreferablefitmeasureforassessingthefitofforecastingmodels.[SeeDiebold(1998b,p.87),whoarguesthatthesimpleR2hasadownwardbiasasameasureoftheout-of-sample,one-step-aheadpredictionerrorvariance.]TheadjustedR2penalizesthelossofdegreesoffreedomthatoccurswhenamodelisexpanded.Thereis,however,somequestionaboutwhetherthepenaltyissufficientlylargetoensurethatthecriterionwillnecessarilyleadtheanalysttothecorrectmodel(assumingthatitisamongtheonesconsidered)asthesamplesizeincreases.Twoalter-nativefitmeasuresthathaveseensuggestedaretheAkaikeinformationcriterion,AIC(K)=s2(1−R2)e2K/n(8-16)y10OurresultsdiffersomewhatfromPesaranandHall’s.Forthefirstrowofthetable,theyreported(−2.180,−1.690)andforthesecond,(−2.456,−1.907).Theyreachthesameconclusion,butthenumbersdodiffersubstantively.Wehavebeenunabletoresolvethedifference.\nGreene-50240bookJune11,200218:49160CHAPTER8✦SpecificationAnalysisandModelSelectionandtheSchwartzorBayesianinformationcriterion,BIC(K)=s2(1−R2)nK/n.(8-17)y(Thereisnodegreesoffreedomcorrectionins2.)Bothmeasuresimprove(decline)asyR2increases,but,everythingelseconstant,degradeasthemodelsizeincreases.LikeR¯2,thesemeasuresplaceapremiumonachievingagivenfitwithasmallernumberofparametersperobservation,K/n.Logsareusuallymoreconvenient;themeasuresreportedbymostsoftwareareee2KAIC(K)=log+(8-18)nneeKlognBIC(K)=log+.(8-19)nnBothpredictioncriteriahavetheirvirtues,andneitherhasanobviousadvantageovertheother.[SeeDiebold(1998b,p.90).]TheSchwarzcriterion,withitsheavierpenaltyfordegreesoffreedomlost,willleantowardasimplermodel.Allelsegiven,simplicitydoeshavesomeappeal.8.5SUMMARYANDCONCLUSIONSThisisthelastofsevenchaptersthatwehavedevotedspecificallytothemostheavilyusedtoolineconometrics,theclassicallinearregressionmodel.WebeganinChapter2withastatementoftheregressionmodel.Chapter3thendescribedcomputationoftheparametersbyleastsquares—apurelyalgebraicexercise.Chapters4and5reinter-pretedleastsquaresasanestimatorofanunknownparametervector,anddescribedthefinitesampleandlargesamplecharacteristicsofthesamplingdistributionoftheestimator.Chapters6and7weredevotedtobuildingandsharpeningtheregressionmodel,withtoolsfordevelopingthefunctionalformandstatisticalresultsfortestinghypothesesabouttheunderlyingpopulation.Inthischapter,wehaveexaminedsomebroadissuesrelatedtomodelspecificationandselectionofamodelamongasetofcompetingalternatives.Theconceptsconsideredherearetiedverycloselytooneofthepillarsoftheparadigmofeconometrics,thatunderlyingthemodelisatheoreticalconstruction,asetoftruebehavioralrelationshipsthatconstitutethemodel.Itisonlyonthisnotionthattheconceptsofbiasandbiasedestimationandmodelselectionmakeanysense—“bias”asaconceptcanonlybedescribedwithrespecttosomeunderlying“model”againstwhichanestimatorcanbesaidtobebiased.Thatis,theremustbeayardstick.Thisconceptisacentralresultintheanalysisofspecification,wherewecon-sideredtheimplicationsofunderfitting(omittingvariables)andoverfitting(includingsuperfluousvariables)themodel.Weconcludedthischapter(andourdiscussionoftheclassicallinearregressionmodel)withanexaminationofproceduresthatareusedtochooseamongcompetingmodelspecifications.\nGreene-50240bookJune11,200218:49CHAPTER8✦SpecificationAnalysisandModelSelection161KeyTermsandConcepts•AdjustedR-squared•Jtest•Schwarzcriterion•Akaikecriterion•Meansquarederror•Simple-to-general•Biasedestimator•Modelselection•Specificationanalysis•Comprehensivemodel•Nonnestedmodels•Stepwisemodelbuilding•Coxtest•Omissionofrelevant•Encompassingprinciplevariables•General-to-simplestrategy•Omittedvariableformula•Inclusionofsuperfluous•Predictioncriterionvariables•PretestestimatorExercises1.Supposethetrueregressionmodelisgivenby(8-2).Theresultin(8-4)showsthatifeitherP1.2isnonzeroorβ2isnonzero,thenregressionofyonX1aloneproducesabiasedandinconsistentestimatorofβ1.Supposetheobjectiveistoforecasty,nottoestimatetheparameters.ConsiderregressionofyonX1alonetoestimateβ1withb1(whichisbiased).IstheforecastofycomputedusingX1b1alsobiased?AssumethatE[X2|X1]isalinearfunctionofX1.Discussyourfindingsgenerally.Whataretheimplicationsforpredictionwhenvariablesareomittedfromaregression?2.Comparethemeansquarederrorsofb1andb1.2inSection8.2.2.(Hint:Thecompar-isondependsonthedataandthemodelparameters,butyoucandeviseacompactexpressionforthetwoquantities.)3.TheJtestinExample8.2iscarriedoutusingover50yearsofdata.Itisoptimistictohopethattheunderlyingstructureoftheeconomydidnotchangein50years.DoestheresultofthetestcarriedoutinExample8.2persistifitisbasedondataonlyfrom1980to2000?Repeatthecomputationwiththissubsetofthedata.4.TheCoxtestinExample8.3hasthesamedifficultyastheJtestinExample8.2.Thesampleperiodmightbetoolongforthetestnottohavebeenaffectedbyunderlyingstructuralchange.Repeatthecomputationsusingthe1980to2000data.\nGreene-50240bookJune11,200219:339NONLINEARREGRESSIONMODELSQ9.1INTRODUCTIONAlthoughthelinearmodelisflexibleenoughtoallowgreatvarietyintheshapeoftheregression,itstillrulesoutmanyusefulfunctionalforms.Inthischapter,weexamineregressionmodelsthatareintrinsicallynonlinearintheirparameters.Thisallowsamuchwiderrangeoffunctionalformsthanthelinearmodelcanaccommodate.19.2NONLINEARREGRESSIONMODELSThegeneralformofthenonlinearregressionmodelisyi=h(xi,β)+εi.(9-1)Thelinearmodelisobviouslyaspecialcase.Moreover,somemodelswhichappeartobenonlinear,suchasy=eβ0xβ1xβ2eε12becomelinearafteratransformation,inthiscaseaftertakinglogarithms.Inthischapter,weareinterestedinmodelsforwhichthereisnosuchtransformation,suchastheonesinthefollowingexamples.Example9.1CESProductionFunctionInExample7.5,weexaminedaconstantelasticityofsubstitutionproductionfunctionmodel:ν−ρ−ρlny=lnγ−ln[δK+(1−δ)L]+ε.ρNotransformationrendersthisequationlinearintheparameters.Wedidfind,however,thatalinearTaylorseriesapproximationtothisfunctionaroundthepointρ=0producedanintrinsicallylinearequationthatcouldbefitbyleastsquares.Nonetheless,thetruemodelisnonlinearinthesensethatinterestsusinthischapter.Example9.2TranslogDemandSystemChristensen,Jorgenson,andLau(1975),proposedthetranslogindirectutilityfunctionforaconsumerallocatingabudgetamongKcommodities:KKK−lnV=β0+βkln(pk/M)+γklln(pk/M)ln(pl/M)k=1k=1l=11AcompletediscussionofthissubjectcanbefoundinAmemiya(1985).OtherimportantreferencesareJennrich(1969),Malinvaud(1970),andespeciallyGoldfeldandQuandt(1971,1972).Averylengthyauthor-itativetreatmentisthetextbyDavidsonandMacKinnon(1993).162\nGreene-50240bookJune11,200219:33CHAPTER9✦NonlinearRegressionModels163whereVisindirectutility,pkisthepriceforthekthcommodityandMisincome.Roy’sidentityappliedtothislogarithmicfunctionproducesabudgetshareequationforthekthcommoditythatisoftheformK∂lnV/∂lnpkβk+j=1γkjln(pj/M)Sk=−=K+ε,k=1,...,K.∂lnV/∂lnMβM+j=1γMjln(pj/M)whereβM=kβkandγMj=kγkj.Notransformationofthebudgetshareequationpro-ducesalinearmodel.Thisisanintrinsicallynonlinearregressionmodel.(Itisalsooneamongasystemofequations,anaspectwewillignoreforthepresent.)9.2.1ASSUMPTIONSOFTHENONLINEARREGRESSIONMODELWeshallrequireasomewhatmoreformaldefinitionofanonlinearregressionmodel.Sufficientforourpurposeswillbethefollowing,whichincludethelinearmodelasthespecialcasenotedearlier.Weassumethatthereisanunderlyingprobabilitydistribution,ordatageneratingprocess(DGP)fortheobservableyiandatrueparametervector,β,whichisacharacteristicofthatDGP.Thefollowingaretheassumptionsofthenonlinearregressionmodel:1.Functionalform:TheconditionalmeanfunctionforyigivenxiisE[yi|xi]=h(xi,β),i=1,...,n,whereh(xi,β)isatwicecontinuouslydifferentiablefunction.2.Identifiabilityofthemodelparameters:Theparametervectorinthemodelisiden-00tified(estimable)ifthereisnononzeroparameterβ=βsuchthath(xi,β)=h(xi,β)forallxi.Inthelinearmodel,thiswasthefullrankassumption,butthesimpleabsenceof“multicollinearity”amongthevariablesinxisnotsufficienttoproducethisconditioninthenonlinearregressionmodel.NotethatthemodelgiveninExample9.2isnotidentified.Iftheparametersinthemodelareallmultipliedbythesamenonzeroconstant,thesameconditionalmeanfunctionresults.Thisconditionpersistsevenifallthevariablesinthemodelarelinearlyindependent.TheindeterminacywasremovedinthestudycitedbyimposingthenormalizationβM=1.3.Zeromeanofthedisturbance:ItfollowsfromAssumption1thatwemaywriteyi=h(xi,β)+εi.whereE[εi|h(xi,β)]=0.Thisstatesthatthedisturbanceatobservationiisuncor-relatedwiththeconditionalmeanfunctionforallobservationsinthesample.Thisisnotquitethesameasassumingthatthedisturbancesandtheexogenousvariablesareuncorrelated,whichisthefamiliarassumption,however.Wewillreturntothispointbelow.4.Homoscedasticityandnonautocorrelation:Asinthelinearmodel,weassumecon-ditionalhomoscedasticity,Eε2h(x,β),j=1,...,n=σ2,afiniteconstant,(9-2)ijandnonautocorrelationE[εiεj|h(xi,β),h(xj,β),j=1,...,n]=0forallj=i.\nGreene-50240bookJune11,200219:33164CHAPTER9✦NonlinearRegressionModels5.Datageneratingprocess:Thedatageneratingprocessforxiisassumedtobeawellbehavedpopulationsuchthatfirstandsecondmomentsofthedatacanbeassumedtoconvergetofixed,finitepopulationcounterparts.Thecrucialassumptionisthattheprocessgeneratingxiisstrictlyexogenoustothatgeneratingεi.Thedataonxiareassumedtobe“wellbehaved.”6.Underlyingprobabilitymodel:Thereisawelldefinedprobabilitydistributiongen-eratingεi.Atthispoint,weassumeonlythatthisprocessproducesasampleofuncorrelated,identically(marginally)distributedrandomvariablesεiwithmean0andvarianceσ2conditionedonh(x,β).Thus,atthispoint,ourstatementoftheimodelissemiparametric.(SeeSection16.3.)Wewillnotbeassuminganypartic-ulardistributionforεi.Theconditionalmomentassumptionsin3and4willbesufficientfortheresultsinthischapter.InChapter17,wewillfullyparameterizethemodelbyassumingthatthedisturbancesarenormallydistributed.Thiswillallowustobemorespecificaboutcertainteststatisticsand,inaddition,allowsomegeneralizationsoftheregressionmodel.Theassumptionisnotnecessaryhere.9.2.2THEORTHOGONALITYCONDITIONANDTHESUMOFSQUARESAssumptions1and3implythatE[εi|h(xi,β)]=0.Inthelinearmodel,itfollows,becauseofthelinearityoftheconditionalmean,thatεiandxi,itself,areuncorrelated.However,uncorrelatednessofεiwithaparticularnonlinearfunctionofxi(theregressionfunction)doesnotnecessarilyimplyuncorrelatednesswithxi,itselfnor,forthatmatter,withothernonlinearfunctionsofxi.Ontheotherhand,theresultswewillobtainbelowforthebehavioroftheestimatorinthismodelarecouchednotintermsofxibutintermsofcertainfunctionsofxi(thederivativesoftheregressionfunction),so,inpointoffact,E[ε|X]=0isnoteventheassumptionweneed.Theforegoingisnotatheoreticalfinepoint.Dynamicmodels,whichareverycom-moninthecontemporaryliterature,wouldgreatlycomplicatethisanalysis.Ifitcanbeassumedthatεiisstrictlyuncorrelatedwithanypriorinformationinthemodel,includ-ingpreviousdisturbances,thenperhapsatreatmentanalogoustothatforthelinearmodelwouldapply.Buttheconvergenceresultsneededtoobtaintheasymptoticprop-ertiesoftheestimatorstillhavetobestrengthened.Thedynamicnonlinearregressionmodelisbeyondthereachofourtreatmenthere.Strictindependenceofεiandxiwouldbesufficientforuncorrelatednessofεiandeveryfunctionofxi,but,again,inadynamicmodel,thisassumptionmightbequestionable.SomecommentaryonthisaspectofthenonlinearregressionmodelmaybefoundinDavidsonandMacKinnon(1993).Ifthedisturbancesinthenonlinearmodelarenormallydistributed,thenthelogofthenormaldensityfortheithobservationwillbelnf(y|x,β,σ2)=−(1/2)ln2π+lnσ2+ε2/σ2.(9-3)iiiForthisspecialcase,wehavefromitemD.2inTheorem17.2(onmaximumlikelihoodestimation),thatthederivativesofthelogdensitywithrespecttotheparametershavemeanzero.Thatis,∂lnf(y|x,β,σ2)1∂h(x,β)iiiE=Eεi=0,(9-4)∂βσ2∂β\nGreene-50240bookJune11,200219:33CHAPTER9✦NonlinearRegressionModels165so,inthenormalcase,thederivativesandthedisturbancesareuncorrelated.Whetherthiscanbeassumedtoholdinothercasesisgoingtobemodelspecific,butunderreasonableconditions,wewouldassumeso.[SeeRuud(2000,p.540).]Inthecontextofthelinearmodel,theorthogonalityconditionE[xiεi]=0producesleastsquaresasaGMMestimatorforthemodel.(SeeChapter18.)Theorthogonalityconditionisthattheregressorsandthedisturbanceinthemodelareuncorrelated.Inthissetting,thesameconditionappliestothefirstderivativesoftheconditionalmeanfunction.Theresultin(9-4)producesamomentconditionwhichwilldefinethenonlinearleastsquaresestimatorasaGMMestimator.Example9.3First-OrderConditionsforaNonlinearModelThefirst-orderconditionsforestimatingtheparametersofthenonlinearmodel,y=β+βeβ3x+ε,i12ibynonlinearleastsquares[see(9-10)]aren∂S(b)=−y−b−beb3xi=0,i12∂b1i=1n∂S(b)=−y−b−beb3xieb3xi=0,i12∂b2i=1n∂S(b)=−y−b−beb3xibxeb3xi=0.i122i∂b3i=1Theseequationsdonothaveanexplicitsolution.Concedingthepotentialforambiguity,wedefineanonlinearregressionmodelatthispointasfollows.DEFINITION9.1NonlinearRegressionModelAnonlinearregressionmodelisoneforwhichthefirst-orderconditionsforleastsquaresestimationoftheparametersarenonlinearfunctionsoftheparameters.Thus,nonlinearityisdefinedintermsofthetechniquesneededtoestimatetheparam-eters,nottheshapeoftheregressionfunction.Laterweshallbroadenourdefinitiontoincludeothertechniquesbesidesleastsquares.9.2.3THELINEARIZEDREGRESSIONThenonlinearregressionmodelisy=h(x,β)+ε.(Tosavesomenotation,wehavedroppedtheobservationsubscript.)ThesamplingtheoryresultsthathavebeenobtainedfornonlinearregressionmodelsarebasedonalinearTaylorseriesapproximationto0h(x,β)ataparticularvaluefortheparametervector,β:K∂h(x,β0)h(x,β)≈h(x,β0)+β−β0.(9-5)0kk∂βkk=1\nGreene-50240bookJune11,200219:33166CHAPTER9✦NonlinearRegressionModelsThisformoftheequationiscalledthelinearizedregressionmodel.Bycollectingterms,weobtainKK∂h(x,β0)∂h(x,β0)h(x,β)≈h(x,β0)−β0+β.(9-6)k0k0∂βk∂βkk=1k=1Letx0equalthekthpartialderivative,2∂h(x,β0)/∂β0.Foragivenvalueofβ0,x0isakkkfunctiononlyofthedata,notoftheunknownparameters.WenowhaveKKh(x,β)≈h0−x0β0+x0β,kkkkk=1k=1whichmaybewrittenh(x,β)≈h0−x0β0+x0β,whichimpliesthaty≈h0−x0β0+x0β+ε.Byplacingtheknowntermsontheleft-handsideoftheequation,weobtainalinearequation:y0=y−h0+x0β0=x0β+ε0.(9-7)Notethatε0containsboththetruedisturbance,ε,andtheerrorinthefirstorderTaylorseriesapproximationtothetrueregression,shownin(9-6).Thatis,KKε0=ε+h(x,β)−h0−x0β0+x0β.(9-8)kkkkk=1k=1Sincealltheerrorsareaccountedfor,(9-7)isanequality,notanapproximation.Withavalueofβ0inhand,wecouldcomputey0andx0andthenestimatetheparametersof(9-7)bylinearleastsquares.(Whetherthisestimatorisconsistentornotremainstobeseen.)Example9.4LinearizedRegressionForthemodelinExample9.3,theregressorsinthelinearizedequationwouldbe∂h(.)0x==1,1∂β010∂h(.)β0xx==e3,2∂β020∂h(.)0β0xx==βxe3.3∂β0230Withasetofvaluesoftheparametersβ,0000000000y=y−hx,β,β,β+βx+βx+βx123112233couldberegressedonthethreevariablespreviouslydefinedtoestimateβ1,β2,andβ3.2Youshouldverifythatforthelinearregressionmodel,thesederivativesaretheindependentvariables.\nGreene-50240bookJune11,200219:33CHAPTER9✦NonlinearRegressionModels1679.2.4LARGESAMPLEPROPERTIESOFTHENONLINEARLEASTSQUARESESTIMATORNumerousanalyticalresultshavebeenobtainedforthenonlinearleastsquaresesti-mator,suchasconsistencyandasymptoticnormality.Wecannotbesurethatnonlinearleastsquaresisthemostefficientestimator,exceptinthecaseofnormallydistributeddisturbances.(Thisconclusionisthesameonewedrewforthelinearmodel.)But,inthesemiparametricsettingofthischapter,wecanaskwhetherthisestimatorisoptimalinsomesensegiventheinformationthatwedohave;theanswerturnsouttobeyes.Someexamplesthatfollowwillillustratethepoints.Itisnecessarytomakesomeassumptionsabouttheregressors.Thepreciserequire-mentsarediscussedinsomedetailinJudgeetal.(1985),Amemiya(1985),andDavidsonandMacKinnon(1993).Inthelinearregressionmodel,toobtainourasymptoticresults,weassumethatthesamplemomentmatrix(1/n)XXconvergestoapositivedefinitematrixQ.Byanalogy,weimposethesameconditiononthederivativesoftheregressionfunction,whicharecalledthepseudoregressorsinthelinearizedmodelwhentheyarecomputedatthetrueparametervalues.Therefore,forthenonlinearregressionmodel,theanalogto(5-1)isn1001∂h(xi,β)∂h(xi,β)0plimXX=plim=Q,(9-9)nn∂β∂βi=1whereQ0isapositivedefinitematrix.Toestablishconsistencyofbinthelinearmodel,werequiredplim(1/n)Xε=0.Wewillusethecounterparttothisforthepseudore-gressors:1nplimx0ε=0.iini=1Thisistheorthogonalityconditionnotedearlierin(5-4).Inparticular,notethatorthog-onalityofthedisturbancesandthedataisnotthesamecondition.Finally,asymptoticnormalitycanbeestablishedundergeneralconditionsif1n0d20√xiεi−→N[0,σQ].ni=1Withtheseinhand,theasymptoticpropertiesofthenonlinearleastsquaresestimatorhavebeenderived.Theyare,infact,essentiallythosewehavealreadyseenforthelinearmodel,exceptthatinthiscaseweplacethederivativesofthelinearizedfunctionevaluatedatβ,X0intheroleoftheregressors.[Amemiya(1985).]Thenonlinearleastsquarescriterionfunctionis1n1nS(b)=[y−h(x,b)]2=e2,(9-10)iii22i=1i=1wherewehaveinsertedwhatwillbethesolutionvalue,b.Thevaluesoftheparametersthatminimize(onehalfof)thesumofsquareddeviationsarethenonlinearleastsquares\nGreene-50240bookJune11,200219:33168CHAPTER9✦NonlinearRegressionModelsestimators.Thefirst-orderconditionsforaminimumaren∂h(x,b)ig(b)=−[yi−h(xi,b)]=0.(9-11)∂bi=1InthelinearmodelofChapter2,thisproducesasetoflinearequations,thenormalequations(3-4).Butinthismoregeneralcase,(9-11)isasetofnonlinearequationsthatdonothaveanexplicitsolution.Notethatσ2isnotrelevanttothesolution[norwasitin(3-4)].Atthesolution,g(b)=−X0e=0,whichisthesameas(3-12)forthelinearmodel.Givenourassumptions,wehavethefollowinggeneralresults:THEOREM9.1ConsistencyoftheNonlinearLeastSquaresEstimatorIfthefollowingassumptionshold:a.Theparameterspaceiscontainingβiscompact(hasnogapsornonconcaveregions),000b.Foranyvectorβinthatparameterspace,plim(1/n)S(β)=q(β),acon-tinuousanddifferentiablefunction,0c.q(β)hasauniqueminimumatthetrueparametervector,β,then,thenonlinearleastsquaresestimatordefinedby(9-10)and(9-11)isconsis-tent.Wewillsketchtheproof,thenconsiderwhythetheoremandtheproofdifferastheydofromtheapparentlysimplercounterpartforthelinearmodel.Theproof,notwithstandingtheunderlyingsubtletiesoftheassumptions,isstraightforward.Theestimator,say,b0minimizes(1/n)S(β0).If(1/n)S(β0)isminimizedforeveryn,thenitisminimizedbyb0asnincreaseswithoutbound.Wealsoassumedthat00theminimizerofq(β)isuniquelyβ.Iftheminimumvalueofplim(1/n)S(β)equalstheprobabilitylimitoftheminimizedvalueofthesumofsquares,thetheoremisproved.Thisequalityisproducedbythecontinuityinassumptionb.Inthelinearmodel,consistencyoftheleastsquaresestimatorcouldbeestablishedbasedonplim(1/n)XX=Qandplim(1/n)Xε=0.Tofollowthatapproachhere,wewouldusethelinearizedmodel,andtakeessentiallythesameresult.Thelooseendinthatargumentwouldbethatthelinearizedmodelisnotthetruemodel,andthereremainsanapproximation.Inorderforthislineofreasoningtobevalid,itmustalsobeeitherassumedorshownthatplim(1/n)X0δ=0whereδ=h(x,β)minustheTayloriiseriesapproximation.AnargumenttothiseffectappearsinMittelhammeretal.(2000,p.190–191).\nGreene-50240bookJune11,200219:33CHAPTER9✦NonlinearRegressionModels169THEOREM9.2AsymptoticNormalityoftheNonlinearLeastSquaresEstimatorIfthepseudoregressorsdefinedin(9-3)are“wellbehaved,”thenσ2a0−1b∼Nβ,(Q),nwhere1Q0=plimX0X0.nThesampleestimateoftheasymptoticcovariancematrixisEst.Asy.Var[b]=σˆ2(X0X0)−1.(9-12)Asymptoticefficiencyofthenonlinearleastsquaresestimatorisdifficulttoestablishwithoutadistributionalassumption.Thereisanindirectapproachthatisonepossibility.TheassumptionoftheorthogonalityofthepseudoregressorsandthetruedisturbancesimpliesthatthenonlinearleastsquaresestimatorisaGMMestimatorinthiscontext.Withtheassumptionsofhomoscedasticityandnonautocorrelation,theoptimalweight-ingmatrixistheonethatweused,whichistosaythatintheclassofGMMestimatorsforthismodel,nonlinearleastsquaresusestheoptimalweightingmatrix.Assuch,itisasymptoticallyefficient.Therequirementthatthematrixin(9-9)convergestoapositivedefinitematriximpliesthatthecolumnsoftheregressormatrixX0mustbelinearlyindependent.Thisidentificationconditionisanalogoustotherequirementthattheindependentvariablesinthelinearmodelbelinearlyindependent.Nonlinearregressionmodelsusuallyinvolveseveralindependentvariables,andatfirstblush,itmightseemsufficienttoexaminethedatadirectlyifoneisconcernedwithmulticollinearity.However,thissituationisnotthecase.Example9.5givesanapplication.9.2.5COMPUTINGTHENONLINEARLEASTSQUARESESTIMATORMinimizingthesumofsquaresisastandardprobleminnonlinearoptimizationthatcanbesolvedbyanumberofmethods.(SeeSectionE.6.)ThemethodofGauss–Newton0isoftenused.Inthelinearizedregressionmodel,ifavalueofβisavailable,thenthelinearregressionmodelshownin(9-7)canbeestimatedbylinearleastsquares.Once0aparametervectorisobtained,itcanplaytheroleofanewβ,andthecomputationcanbedoneagain.Theiterationcancontinueuntilthedifferencebetweensuccessiveparametervectorsissmallenoughtoassumeconvergence.Oneofthemainvirtuesofthismethodisthatatthelastiterationtheestimateof(Q0)−1will,apartfromthescalefactorσˆ2/n,providethecorrectestimateoftheasymptoticcovariancematrixfortheparameterestimator.\nGreene-50240bookJune11,200219:33170CHAPTER9✦NonlinearRegressionModelsThisiterativesolutiontotheminimizationproblemis−1nnb=x0x0x0y−h0+x0bt+1iiiiiiti=1i=1−1nn=b+x0x0x0y−h0tiiiiii=1i=1=b+(X0X0)−1X0e0t=bt+t,wherealltermsontheright-handsideareevaluatedatbande0isthevectorofnonlin-tearleastsquaresresiduals.Thisalgorithmhassomeintuitiveappealaswell.Foreachiteration,weupdatethepreviousparameterestimatesbyregressingthenonlinearleastsquaresresidualsonthederivativesoftheregressionfunctions.Theprocesswillhaveconverged(i.e.,theupdatewillbe0)whenX0e0iscloseenoughto0.Thisderivativehasadirectcounterpartinthenormalequationsforthelinearmodel,Xe=0.Asusual,whenusingadigitalcomputer,wewillnotachieveexactconvergencewithX0e0exactlyequaltozero.Auseful,scale-freecounterparttotheconvergencecriteriondiscussedinSectionE.6.5isδ=e0X0(X0X0)−1X0e0.Wenote,finally,thatiterationofthelinearizedregression,althoughaveryeffectivealgorithmformanyproblems,doesnotalwayswork.AsdoesNewton’smethod,thisalgorithmsometimes“jumpsoff”toawildlyerrantseconditerate,afterwhichitmaybeimpossibletocomputetheresidualsforthenextiteration.Thechoiceofstartingvaluesfortheiterationscanbecrucial.Thereisartaswellasscienceinthecomputationofnonlinearleastsquaresestimates.[SeeMcCulloughandVinod(1999).]Intheabsenceofinformationaboutstartingvalues,aworkablestrategyistotrytheGauss–Newtoniterationfirst.Ifitfails,gobacktotheinitialstartingvaluesandtryoneofthemoregeneralalgorithms,suchasBFGS,treatingminimizationofthesumofsquaresasanotherwiseordinaryoptimizationproblem.Aconsistentestimatorofσ2isbasedontheresiduals:1nσˆ2=[y−h(x,b)]2.(9-13)iini=1Adegreesoffreedomcorrection,1/(n−K),whereKisthenumberofelementsinβ,isnotstrictlynecessaryhere,becauseallresultsareasymptoticinanyevent.DavidsonandMacKinnon(1993)arguethatonaverage,(9-13)willunderestimateσ2,andoneshouldusethedegreesoffreedomcorrection.Mostsoftwareincurrentuseforthismodeldoes,butanalystswillwanttoverifywhichisthecasefortheprogramtheyareusing.Withthisinhand,theestimatoroftheasymptoticcovariancematrixforthenonlinearleastsquaresestimatorisgivenin(9-12).Oncethenonlinearleastsquaresestimatesareinhand,inferenceandhypothesistestscanproceedinthesamefashionasprescribedinChapter7.Aminorproblemcanariseinevaluatingthefitoftheregressioninthatthefamiliarmeasure,n22i=1eiR=1−n,(9-14)(yi−y¯)2i=1isnolongerguaranteedtobeintherangeof0to1.Itdoes,however,provideausefuldescriptivemeasure.\nGreene-50240bookJune11,200219:33CHAPTER9✦NonlinearRegressionModels1719.3APPLICATIONSWewillexaminetwoapplications.Thefirstisanonlinearextensionoftheconsump-tionfunctionexaminedinExample2.1.TheBox–CoxtransformationpresentedinSection9.3.2isadeviceusedtosearchforfunctionalforminregression.9.3.1ANonlinearConsumptionFunctionThelinearconsumptionfunctionanalyzedatthebeginningofChapter2isarestrictedversionofthemoregeneralconsumptionfunctionC=α+βYγ+ε,inwhichγequals1.Withthisrestriction,themodelislinear.Ifγisfreetovary,however,thenthisversionbecomesanonlinearregression.Thelinearizedmodelis00γ000γ000γ0γ00γ0C−α+βY+α1+βY+γβYlnY=α+βY+γβYlnY+ε.Thenonlinearleastsquaresprocedurereducestoiteratedregressionof1000γ00∂h(.)∂h(.)∂h(.)γ0.C=C+γβYlnYonx==Y∂α∂β∂γγ0βYlnYQuarterlydataonconsumption,realdisposableincome,andseveralothervariablesfor1950to2000arelistedinAppendixTableF5.1.Wewillusethesetofitthenonlinearconsumptionfunction.Thisturnsouttobeaparticularlystraightforwardestimationproblem.Iterationsarebegunatthelinearleastsquaresestimatesforαandβand1forγ.Asshownbelow,thesolutionisreachedin8iterations,afterwhichanyfurtheriterationismerely“finetuning”thehiddendigits.(i.e.,thosethattheanalystwouldnotbereportingtotheirreader.)(“Gradient”isthescale-freeconvergencemeasurenotedabove.)BeginNLSQiterations.Linearizedregression.Iteration=1;Sumofsquares=1536321.88;Gradient=996103.930Iteration=2;Sumofsquares=.1847×1012;Gradient=.1847×1012Iteration=3;Sumofsquares=20406917.6;Gradient=19902415.7Iteration=4;Sumofsquares=581703.598;Gradient=77299.6342Iteration=5;Sumofsquares=504403.969;Gradient=.752189847Iteration=6;Sumofsquares=504403.216;Gradient=.526642396E-04Iteration=7;Sumofsquares=504403.216;Gradient=.511324981E-07Iteration=8;Sumofsquares=504403.216;Gradient=.606793426E-10ThelinearandnonlinearleastsquaresregressionresultsareshowninTable9.1.Findingthestartingvaluesforanonlinearprocedurecanbedifficult.Simplytryingaconvenientsetofvaluescanbeunproductive.Unfortunately,therearenogoodrulesforstartingvalues,exceptthattheyshouldbeasclosetothefinalvaluesaspossible(notparticularlyhelpful).Whenitispossible,aninitialconsistentestimatorofβwillbeagoodstartingvalue.Inmanycases,however,theonlyconsistentestimatoravailable\nGreene-50240bookJune11,200219:33172CHAPTER9✦NonlinearRegressionModelsTABLE9.1EstimatedConsumptionFunctionsLinearModelNonlinearModelParameterEstimateStandardErrorEstimateStandardErrorα−80.354714.3059458.799022.5014β0.92170.0038720.10085.01091γ1.0000—1.24483.01205ee1,536,321.881504,403.1725σ87.2098350.0946R2.996448.998834Var[b]—0.000119037Var[c]—0.00014532Cov[b,c]—−0.000131491istheonewearetryingtocomputebyleastsquares.Forbetterorworse,trialanderroristhemostfrequentlyusedprocedure.Forthepresentmodel,anaturalsetofvaluescanbeobtainedbecauseasimplelinearmodelisaspecialcase.Thus,wecanstartαandβatthelinearleastsquaresvaluesthatwouldresultinthespecialcaseofγ=1anduse1forthestartingvalueforγ.Theproceduresoutlinedearlierareusedatthelastiterationtoobtaintheasymptoticstandarderrorsandanestimateofσ2.(Tomakethiscomparabletos2inthelinearmodel,thevalueincludesthedegreesoffreedomcorrection.)TheestimatesforthelinearmodelareshowninTable9.1aswell.Eightiterationsarerequiredforconvergence.Thevalueofδisshownattheright.Notethatthecoefficientvectortakesaveryerrantstepafterthefirstiteration—thesumofsquaresbecomeshuge—buttheiterationssettledownafterthatandconvergeroutinely.Forhypothesistestingandconfidenceintervals,theusualprocedurescanbeused,withtheprovisothatallresultsareonlyasymptotic.Assuch,fortestingarestriction,thechi-squaredstatisticratherthantheFratioislikelytobemoreappropriate.Forexample,fortestingthehypothesisthatγisdifferentfrom1,anasymptoticttest,basedonthestandardnormaldistribution,iscarriedout,using1.24483−1z==20.3178.0.01205Thisresultislargerthanthecriticalvalueof1.96forthe5percentsignificancelevel,andwethusrejectthelinearmodelinfavorofthenonlinearregression.Wearealsointerestedinthemarginalpropensitytoconsume.Inthisexpandedmodel,H0:γ=1isatestthatthemarginalpropensitytoconsumeisconstant,notthatitis1.(Thatwouldbeajointtestofbothγ=1andβ=1.)Inthismodel,themarginalpropensitytocon-sumeisdcMPC==βγYγ−1,dYwhichvarieswithY.Totestthehypothesisthatthisvalueis1,werequireaparticularvalueofY.Sinceitisthemostrecentvalue,wechooseDPI2000.4=6634.9.Atthisvalue,theMPCisestimatedas1.08264.Weestimateitsstandarderrorusingthedeltamethod,\nGreene-50240bookJune11,200219:33CHAPTER9✦NonlinearRegressionModels173withthesquarerootofVar[b]Cov[b,c]∂MPC/∂b[∂MPC/∂b∂MPC/∂c]Cov[b,c]Var[c]∂MPC/∂ccYc−10.00011904−0.000131491=[cYc−1bYc−1(1+clnY)]−0.0001314910.00014532bYc−1(1+clnY)=0.00007469,whichgivesastandarderrorof0.0086425.FortestingthehypothesisthattheMPCisequalto1.0in2000.4,wewouldrefer1.08264−1z==−9.5620.0086425toastandardnormaltable.Thisdifferenceiscertainlystatisticallysignificant,sowewouldrejectthehypothesis.Example9.5MulticollinearityinNonlinearRegressionIntheprecedingexample,thereisnoquestionofcollinearityinthedatamatrixX=[i,y];thevariationinYisobviousoninspection.Butatthefinalparameterestimates,theR2intheregressionis0.999312andthecorrelationbetweenthetwopseudoregressorsx0=Yγand2x0=βYγlnYis0.999752.Theconditionnumberforthenormalizedmatrixofsumsofsquares3andcrossproductsis208.306.(TheconditionnumberiscomputedbycomputingthesquarerootoftheratioofthelargesttosmallestcharacteristicrootofD−1X0X0D−1wherex0=11andDisthediagonalmatrixcontainingthesquarerootsofx0x0onthediagonal.)Recallkkthat20wasthebenchmarkvalueforaproblematicdataset.BythestandardsdiscussedinSection4.9.1,thecollinearityprobleminthis“dataset”issevere.9.3.2THEBOX–COXTRANSFORMATIONTheBox–Coxtransformationisadeviceforgeneralizingthelinearmodel.Thetrans-formationis3xλ−1x(λ)=.λInaregressionmodel,theanalysiscanbedoneconditionally.Foragivenvalueofλ,themodelK(λ)y=α+βkxk+ε(9-15)k=1isalinearregressionthatcanbeestimatedbyleastsquares.4Inprinciple,eachregressorcouldbetransformedbyadifferentvalueofλ,but,inmostapplications,thislevelofgeneralitybecomesexcessivelycumbersome,andλisassumedtobethesameforallthevariablesinthemodel.5Atthesametime,itisalsopossibletotransformy,say,by3BoxandCox(1964).Tobedefinedforallvaluesofλ,xmustbestrictlypositive.SeealsoZarembka(1974).4Inmostapplications,someoftheregressors—forexample,dummyvariable—willnotbetransformed.For(λ)suchavariable,sayνk,νk=νk,andtherelevantderivativesin(9-16)willbezero.5See,forexample,SeaksandLayson(1983).\nGreene-50240bookJune11,200219:33174CHAPTER9✦NonlinearRegressionModelsy(θ).Transformationofthedependentvariable,however,amountstoaspecificationofthewholemodel,notjustthefunctionalform.WewillexaminethiscasemorecloselyinSection17.6.2.Example9.6FlexibleCostFunctionCaves,Christensen,andTrethaway(1980)analyzedthecostsofproductionforrailroadsprovidingfreightandpassengerservice.Continuingalonglineofliteratureonthecostsofproductioninregulatedindustries,atranslogcostfunction(seeSection14.3.2)wouldbeanaturalchoiceformodelingthismultiple-outputtechnology.Severalofthefirmsinthestudy,however,producednopassengerservice,whichwouldprecludetheuseofthetranslogmodel.(Thismodelwouldrequirethelogofzero.)AnalternativeistheBox–Coxtransformation,whichiscomputableforzerooutputlevels.Aconstraintmuststillbeplacedonλintheirmodel,as0(λ)isdefinedonlyifλisstrictlypositive.Apositivevalueofλisnotassured.Aquestiondoesariseinthiscontext(andothersimilarones)astowhetherzerooutputsshouldbetreatedthesameasnonzerooutputsorwhetheranoutputofzerorepresentsadiscretecorporatedecisiondistinctfromothervariationsintheoutputlevels.Inaddition,ascanbeseenin(9-16),thissolutionisonlypartial.Thezerovaluesoftheregressorsprecludecomputationofappropriatestandarderrors.Ifλin(9-15)istakentobeanunknownparameter,thentheregressionbecomesnonlin-earintheparameters.Althoughnotransformationwillreduceittolinearity,nonlinearleastsquaresisstraightforward.Inmostinstances,wecanexpecttofindtheleastsquaresvalueofλbetween−2and2.Typically,then,λisestimatedbyscanningthisrangeforthevaluethatminimizesthesumofsquares.Whenλequalszero,thetransformationis,byL’Hopitalˆ’srule,xλ−1d(xλ−1)/dλlim=lim=limxλ×lnx=lnx.λ→0λλ→01λ→0Oncetheoptimalvalueofλislocated,theleastsquaresestimates,themeansquaredresidual,andthisvalueofλconstitutethenonlinearleastsquares(and,withnormalityofthedisturbance,maximumlikelihood)estimatesoftheparameters.Afterdeterminingtheoptimalvalueofλ,itissometimestreatedasifitwereaknownvalueintheleastsquaresresults.Butλˆisanestimateofanunknownparameter.Itisnothardtoshowthattheleastsquaresstandarderrorswillalwaysunderestimatethecorrectasymptoticstandarderrors.6Togettheappropriatevalues,weneedthederivativesoftheright-handsideof(9-15)withrespecttoα,β,andλ.InthenotationofSection9.2.3,theseare∂h(.)=1,∂α∂h(.)(λ)=xk,(9-16)∂βkK(λ)K∂h(.)∂xk1λ(λ)=βk=βkxklnxk−xk.∂λ∂λλk=1k=16SeeFomby,Hill,andJohnson(1984,pp.426–431).\nGreene-50240bookJune11,200219:33CHAPTER9✦NonlinearRegressionModels175Wecannowuse(9-12)and(9-13)toestimatetheasymptoticcovariancematrixoftheparameterestimates.Notethatlnxkappearsin∂h(.)/∂λ.Ifxk=0,thenthismatrixcannotbecomputed.ThiswasthepointnotedattheendofExample9.6.Itisimportanttorememberthatthecoefficientsinanonlinearmodelarenotequaltotheslopes(i.e.,herethedemandelasticities)withrespecttothevariables.FortheBox–Coxmodel,7Xλ−1lnY=α+β+ελ(9-17)dE[lnY|X]=βXλ=η.dlnXStandarderrorsfortheseestimatescanbeobtainedusingthedeltamethod.Thederiva-tivesare∂η/∂β=η/βand∂η/∂λ=ηlnX.Collectingterms,weobtainAsy.Var[ηˆ]=(η/β)2Asy.Var[βˆ]+(βlnX)2Asy.Var[λˆ]+(2βlnX)Asy.Cov[β,ˆλˆ].9.4HYPOTHESISTESTINGANDPARAMETRICRESTRICTIONSInmostcases,thesortsofhypothesesonewouldtestinthiscontextwillinvolvefairlysimplelinearrestrictions.ThetestscanbecarriedoutusingtheusualformulasdiscussedinChapter7andtheasymptoticcovariancematrixpresentedearlier.Formoreinvolvedhypothesesandfornonlinearrestrictions,theproceduresareabitlessclear-cut.ThreeprincipaltestingprocedureswerediscussedinSection6.4andAppendixC:theWald,likelihoodratio,andLagrangemultipliertests.Forthelinearmodel,allthreestatisticsaretransformationsofthestandardFstatistic(seeSection17.6.1),sothetestsareessentiallyidentical.Inthenonlinearcase,theyareequivalentonlyasymptotically.WewillworkthroughtheWaldandLagrangemultipliertestsforthegeneralcaseandthenapplythemtotheexampleoftheprevioussection.Sincewehavenotassumednormalityofthedisturbances(yet),wewillpostponetreatmentofthelikelihoodratiostatisticuntilwerevisitthismodelinChapter17.9.4.1SIGNIFICANCETESTSFORRESTRICTIONS:FANDWALDSTATISTICSThehypothesistobetestedisH0:r(β)=q.(9-18)wherer(β)isacolumnvectorofJcontinuousfunctionsoftheelementsofβ.Theserestrictionsmaybelinearornonlinear.Itisnecessary,however,thattheybeoveriden-tifyingrestrictions.Thus,informalterms,iftheoriginalparametervectorhasKfreeelements,thenthehypothesisr(β)−qmustimposeatleastonefunctionalrelationship7WehaveusedtheresultdlnY/dlnX=XdlnY/dX.\nGreene-50240bookJune11,200219:33176CHAPTER9✦NonlinearRegressionModelsontheparameters.Ifthereismorethanonerestriction,thentheymustbefunctionallyindependent.ThesetwoconditionsimplythattheJ×Kmatrix∂r(β)R(β)=(9-19)∂βmusthavefullrowrankandthatJ,thenumberofrestrictions,mustbestrictlylessthanK.(Thissituationisanalogoustothelinearmodel,inwhichR(β)wouldbethematrixofcoefficientsintherestrictions.)Letbbetheunrestricted,nonlinearleastsquaresestimator,andletb∗betheesti-matorobtainedwhentheconstraintsofthehypothesisareimposed.8Whichteststatisticoneusesdependsonhowdifficultthecomputationsare.Unlikethelinearmodel,thevar-ioustestingproceduresvaryincomplexity.Forinstance,inourexample,theLagrangemultiplierisbyfarthesimplesttocompute.Ofthefourmethodswewillconsider,onlythistestdoesnotrequireustocomputeanonlinearregression.ThenonlinearanalogtothefamiliarFstatisticbasedonthefitoftheregression(i.e.,thesumofsquaredresiduals)wouldbe[S(b∗)−S(b)]/JF[J,n−K]=.(9-20)S(b)/(n−K)ThisequationhastheappearanceofourearlierFratio.Inthenonlinearsetting,how-ever,neitherthenumeratornorthedenominatorhasexactlythenecessarychi-squareddistribution,sotheFdistributionisonlyapproximate.NotethatthisFstatisticrequiresthatboththerestrictedandunrestrictedmodelsbeestimated.TheWaldtestisbasedonthedistancebetweenr(b)andq.Iftheunrestrictedesti-matesfailtosatisfytherestrictions,thendoubtiscastonthevalidityoftherestrictions.Thestatisticis−1W=[r(b)−q]Est.Asy.Var[r(b)−q][r(b)−q](9-21)−1=[r(b)−q]R(b)VRˆ(b)[r(b)−q],whereVˆ=Est.Asy.Var[b],andR(b)isevaluatedatb,theestimateofβ.Underthenullhypothesis,thisstatistichasalimitingchi-squareddistributionwithJdegreesoffreedom.Iftherestrictionsarecorrect,theWaldstatisticandJtimestheFstatisticareasymptoticallyequivalent.TheWaldstatisticcanbebasedontheestimatedcovariancematrixobtainedearlierusingtheunrestrictedestimates,whichmayprovidealargesavingsincomputingeffortiftherestrictionsarenonlinear.Itshouldbenotedthatthesmall-samplebehaviorofWcanbeerratic,andthemoreconservativeFstatisticmaybepreferableifthesampleisnotlarge.ThecaveataboutWaldstatisticsthatappliedinthelinearcaseapplieshereaswell.Becauseitisapuresignificancetestthatdoesnotinvolvethealternativehypothesis,the8Thiscomputationalproblemmaybeextremelydifficultinitsownright,especiallyiftheconstraintsarenonlinear.Weassumethattheestimatorhasbeenobtainedbywhatevermeansarenecessary.\nGreene-50240bookJune11,200219:33CHAPTER9✦NonlinearRegressionModels177Waldstatisticisnotinvarianttohowthehypothesisisframed.Incasesinwhichtherearemorethanoneequivalentwaystospecifyr(β)=q,Wcangivedifferentanswersdependingonwhichischosen.9.4.2TESTSBASEDONTHELMSTATISTICTheLagrangemultipliertestisbasedonthedecreaseinthesumofsquaredresidualsthatwouldresultiftherestrictionsintherestrictedmodelwerereleased.TheformalitiesofthetestaregiveninSections17.5.3and17.6.1.Forthenonlinearregressionmodel,thetesthasaparticularlyappealingform.9Letebethevectorofresidualsy−h(x,b)∗ii∗computedusingtherestrictedestimates.RecallthatwedefinedX0asann×Kmatrixofderivativescomputedataparticularparametervectorin(9-9).LetX0bethisma-∗trixcomputedattherestrictedestimates.ThentheLagrangemultiplierstatisticforthenonlinearregressionmodeliseX0[X0X0]−1X0e∗∗∗∗∗∗LM=.(9-22)ee∗/n∗UnderH0,thisstatistichasalimitingchi-squareddistributionwithJdegreesoffreedom.Whatisespeciallyappealingaboutthisapproachisthatitrequiresonlytherestrictedestimates.Thismethodmayprovidesomesavingsincomputingeffortif,asinourexample,therestrictionsresultinalinearmodel.Note,also,thattheLagrangemultiplierstatisticisntimestheuncenteredR2intheregressionofeonX0.ManyLagrange∗∗multiplierstatisticsarecomputedinthisfashion.Example9.7HypothesesTestsinaNonlinearRegressionModelWetestthehypothesisH0:γ=1intheconsumptionfunctionofSection9.3.1.•Fstatistic.TheFstatisticis(1,536,321.881−504,403.57)/1F[1,204−3]==411.29.504,403.57/(204−3)Thecriticalvaluefromthetablesis4.18,sothehypothesisisrejected.•Waldstatistic.Forourexample,theWaldstatisticisbasedonthedistanceofγˆfrom1andissimplythesquareoftheasymptotictratiowecomputedattheendoftheexample:(1.244827−1)2W==412.805.0.012052Thecriticalvaluefromthechi-squaredtableis3.84.•Lagrangemultiplier.Forourexample,theelementsinx∗arei∗γγx=[1,Y,βγYlnY].iTocomputethisattherestrictedestimates,weusetheordinaryleastsquaresestimatesforαandβand1forγsothat∗x=[1,Y,βYlnY].i9ThistestisderivedinJudgeetal.(1985).AlengthydiscussionappearsinMittelhammeretal.(2000).\nGreene-50240bookJune11,200219:33178CHAPTER9✦NonlinearRegressionModelsTheresidualsaretheleastsquaresresidualscomputedfromthelinearregression.Insertingthevaluesgivenearlier,wehave996,103.9LM==132.267.(1,536,321.881/204)Asexpected,thisstatisticisalsolargerthanthecriticalvaluefromthechi-squaredtable.9.4.3ASPECIFICATIONTESTFORNONLINEARREGRESSIONS:THEPETESTMacKinnon,White,andDavidson(1983)haveextendedtheJtestdiscussedinSec-tion8.3.3tononlinearregressions.Oneresultofthisanalysisisasimpletestforlinearityversusloglinearity.ThespecifichypothesistobetestedisH:y=h0(x,β)+ε00versusH:g(y)=h1(z,γ)+ε,11wherexandzareregressorvectorsandβandγaretheparameters.Astheauthorsnote,usingyinsteadof,say,j(y)inthefirstfunctionisnothingmorethananimplicitdefinitionoftheunitsofmeasurementofthedependentvariable.Anintermediatecaseisuseful.Ifweassumethatg(y)isequaltoybutweallowh0(.)andh1(.)tobenonlinear,thenthenecessarymodificationoftheJtestisstraightforward,albeitperhapsabitmoredifficulttocarryout.Forthiscase,weformthecompoundmodely=(1−α)h0(x,β)+αh1(z,γ)+ε(9-23)=h0(x,β)+α[h1(z,γ)−h0(x,β)]+ε.Presumably,bothβandγcouldbeestimatedinisolationbynonlinearleastsquares.Supposethatanonlinearleastsquaresestimateofγhasbeenobtained.Oneapproachistoinsertthisestimatein(9-23)andthenestimateβandαbynonlinearleastsquares.TheJtestamountstotestingthehypothesisthatαequalszero.Ofcourse,themodelissymmetricinh0(.)andh1(.),sotheirrolescouldbereversed.Thesameconclusionsdrawnearlierwouldapplyhere.DavidsonandMacKinnon(1981)proposewhatmaybeasimpleralternative.Givenanestimateofβ,sayβˆ,approximateh0(x,β)withalinearTaylorseriesatthispoint.Theresultis∂h0(.)h0(x,β)≈h0(x,βˆ)+(β−βˆ)=hˆ0+Hˆ0β−Hˆ0βˆ.(9-24)∂βˆUsingthisdevice,theyreplace(9-23)withy−hˆ0=Hˆ0b+α[h1(z,γˆ)−h0(x,βˆ)]+e,inwhichbandαcanbeestimatedbylinearleastsquares.Asbefore,theJtestamountstotestingthesignificanceofαˆ.Ifitisfoundthatαˆissignificantlydifferentfromzero,thenH0isrejected.Fortheauthors’asymptoticresultstohold,anyconsistentestimator\nGreene-50240bookJune11,200219:33CHAPTER9✦NonlinearRegressionModels179ofβwillsufficeforβˆ;thenonlinearleastsquaresestimatorthattheysuggestseemsanaturalchoice.10Nowwecangeneralizethetesttoallowanonlinearfunction,g(y),inH1.DavidsonandMacKinnonrequireg(y)tobemonotonic,continuous,andcontinuouslydifferentiableandnottointroduceanynewparameters.(ThisrequirementexcludestheBox–Coxmodel,whichisconsideredinSection9.3.2.)Thecompoundmodelthatformsthebasisofthetestis(1−α)[y−h0(x,β)]+α[g(y)−h1(z,γ)]=ε.(9-25)Again,therearetwoapproaches.Asbefore,ifγˆisanestimateofγ,thenβandαcanbeestimatedbymaximumlikelihoodconditionalonthisestimate.11Thismethodpromisestobeextremelymessy,andanalternativeisproposed.Rewrite(9-25)asy−h0(x,β)=α[h1(z,γ)−g(y)]+α[y−h0(x,β)]+ε.NowusethesamelinearTaylorseriesexpansionforh0(x,β)ontheleft-handsideandreplacebothyandh0(x,β)withhˆ0ontheright.Theresultingmodelisy−hˆ0=Hˆ0b+α[hˆ1−g(hˆ0)]+e.(9-26)Asbefore,withanestimateofβ,thismodelcanbeestimatedbyleastsquares.ThismodifiedformoftheJtestislabeledthePEtest.Astheauthorsdiscuss,itisprobablynotaspowerfulasanyoftheWaldorLagrangemultiplierteststhatwehaveconsidered.Intheirexperience,however,ithassufficientpowerforappliedresearchandisclearlysimpletocarryout.ThePEtestcanbeusedtotestalinearspecificationagainstaloglinearmodel.Forthistest,bothh0(.)andh1(.)arelinear,whereasg(y)=lny.LetthetwocompetingmodelsbedenotedH:y=xβ+ε0andH:lny=ln(x)γ+ε.1[Westretchtheusualnotationalconventionsbyusingln(x)for(lnx1,...,lnxk).]Nowletbandcbethetwolinearleastsquaresestimatesoftheparametervectors.ThePEtestforH1asanalternativetoH0iscarriedoutbytestingthesignificanceofthecoefficientαˆinthemodely=xβ+α[lny−ln(xb)]+φ.(9-27)Thesecondtermisthedifferencebetweenpredictionsoflnyobtaineddirectlyfromtheloglinearmodelandobtainedasthelogofthepredictionfromthelinearmodel.WecanalsoreversetherolesofthetwoformulasandtestH0asthealternative.The10ThisprocedureassumesthatH0iscorrect,ofcourse.11Leastsquareswillbeinappropriatebecauseofthetransformationofy,whichwilltranslatetoaJacobianterminthelog-likelihood.SeethelaterdiscussionoftheBox–Coxmodel.\nGreene-50240bookJune11,200219:33180CHAPTER9✦NonlinearRegressionModelsTABLE9.2EstimatedMoneyDemandEquationsabcR2srYLinear−228.714−23.8490.17700.9554876.277(13.891)(2.044)(0.00278)PEtestforthelinearmodel,αˆ=−121.496(46.353),t=−2.621Loglinear−8.9473−0.25901.82050.966470.14825(0.2181)(0.0236)(0.0289)PEtestfortheloglinearmodel,αˆ=−0.0003786(0.0001969),t=1.925compoundregressionislny=ln(x)γ+αyˆ−eln(x)c+ε.(9-28)Thetestoflinearityvs.loglinearityhasbeenthesubjectofanumberofstudies.GodfreyandWickens(1982)discussseveralapproaches.Example9.8MoneyDemandAlargenumberofstudieshaveestimatedmoneydemandequations,somelinearandsomelog-linear.12Quarterlydatafrom1950to2000forestimationofamoneydemandequationaregiveninAppendixTableF5.1.Theinterestrateisthequarterlyaverageofthemonthlyaverage90dayT-billrate.ThemoneystockisM1.RealGDPisseasonallyadjustedandstatedin1996constantdollars.ResultsofthePEtestofthelinearversustheloglinearmodelareshowninTable9.2.RegressionsofMonaconstant,randY,andlnMonaconstant,lnrandlnY,producetheresultsgiveninTable9.2(standarderrorsaregiveninparentheses).Bothmodelsappeartofitquitewell,13andthepatternofsignificanceofthecoefficientsisthesameinbothequations.Aftercomputingfittedvaluesfromthetwoequations,theestimatesofαfromthetwomodelsareasshowninTable9.2.Referringthesetoastandardnormaltable,werejectthelinearmodelinfavoroftheloglinearmodel.9.5ALTERNATIVEESTIMATORSFORNONLINEARREGRESSIONMODELSSection9.2discussesthe“standard”caseinwhichtheonlycomplicationtotheclassicalregressionmodelofChapter2isthattheconditionalmeanfunctioninyi=h(xi,β)+εiisanonlinearfunctionofβ.Thisfactmandatesanalternativeestimator,nonlinearleastsquares,andsomenewinterpretationofthe“regressors”inthemodel.Inthissection,wewillconsidertwoextensionsoftheseresults.First,asinthelinearcase,therecanbesituationsinwhichtheassumptionthatCov[xi,εi]=0isnotreasonable.Thesesituationswill,asbefore,requireaninstrumentalvariablestreatment,whichweconsiderinSection9.5.1.Second,therewillbemodelsinwhichitisconvenienttoestimatetheparametersintwosteps,estimatingonesubsetatthefirststepandthenusingtheseestimatesinasecondstepatwhichtheremainingparametersareestimated.12AcomprehensivesurveyappearsinGoldfeld(1973).13Theinterestelasticityisinlinewiththereceivedresults.Theincomeelasticityisquiteabitlarger.\nGreene-50240bookJune11,200219:33CHAPTER9✦NonlinearRegressionModels181Wewillhavetomodifyourasymptoticresultssomewhattoaccommodatethisestimationstrategy.Thetwo-stepestimatorisdiscussedinSection9.5.2.9.5.1NONLINEARINSTRUMENTALVARIABLESESTIMATIONInSection5.4,weextendedthelinearregressionmodeltoallowforthepossibilitythattheregressorsmightbecorrelatedwiththedisturbances.Thesameproblemcanariseinnonlinearmodels.TheconsumptionfunctionestimatedinSection9.3.1isalmostsurelyacaseinpoint,andwereestimateditusingtheinstrumentalvariablestechniqueforlinearmodelsinExample5.3.Inthissection,wewillextendthemethodofinstrumentalvariablestononlinearregressionmodels.Inthenonlinearmodel,yi=h(xi,β)+εi,thecovariatesximaybecorrelatedwiththedisturbances.Wewouldexpectthiseffecttobetransmittedtothepseudoregressors,x0=∂h(x,β)/∂β.Ifso,thentheresultsthatiiwederivedforthelinearizedregressionwouldnolongerhold.Supposethatthereisasetofvariables[z1,...,zL]suchthatplim(1/n)Zε=0(9-29)andplim(1/n)ZX0=Q0=0,zxwhereX0isthematrixofpseudoregressorsinthelinearizedregression,evaluatedatthetrueparametervalues.IftheanalysisthatwedidforthelinearmodelinSection5.4canbeappliedtothissetofvariables,thenwewillbeabletoconstructaconsistentestimatorforβusingtheinstrumentalvariables.Asafirststep,wewillattempttoreplicatetheapproachthatweusedforthelinearmodel.Thelinearizedregressionmodelisgivenin(9-7),y=h(X,β)+ε≈h0+X0(β−β0)+εory0≈X0β+ε,wherey0=y−h0+X0β0.Forthemoment,weneglecttheapproximationerrorinlinearizingthemodel.In(9-29),wehaveassumedthatplim(1/n)Zy0=plim(1/n)ZX0β.(9-30)Suppose,aswedidbefore,thattherearethesamenumberofinstrumentalvariablesasthereareparameters,thatis,columnsinX0.(Note:Thisnumberneednotbethenumberofvariables.Seeourprecedingexample.)Thenthe“estimator”usedbeforeissuggested:b=(ZX0)−1Zy0.(9-31)IV\nGreene-50240bookJune11,200219:33182CHAPTER9✦NonlinearRegressionModelsThelogicissound,butthereisaproblemwiththisestimator.Theunknownparametervectorβappearsonbothsidesof(9-30).Wemightconsidertheapproachweusedforourfirstsolutiontothenonlinearregressionmodel.Thatis,withsomeinitialestima-torinhand,iteratebackandforthbetweentheinstrumentalvariablesregressionandrecomputingthepseudoregressorsuntiltheprocessconvergestothefixedpointthatweseek.Onceagain,thelogicissound,andinprinciple,thismethoddoesproducetheestimatorweseek.Ifweaddtoourprecedingassumptions1d2√Zε−→N[0,σQzz],nthenwewillbeabletousethesameformoftheasymptoticdistributionforthisestimatorthatwedidforthelinearcase.Beforedoingso,wemustfillinsomegapsinthepreceding.First,despiteitsintuitiveappeal,thesuggestedprocedureforfindingtheestimatorisveryunlikelytobeagoodalgorithmforlocatingtheestimates.Second,wedonotwishtolimitourselvestothecaseinwhichwehavethesamenumberofinstrumentalvariablesasparameters.So,wewillconsidertheproblemingeneralterms.Theestimationcriterionfornonlinearinstrumentalvariablesisaquadraticform,MinS(β)=1[y−h(X,β)]Z(ZZ)−1Z[y−h(X,β)]β2=1ε(β)Z(ZZ)−1Zε(β).2Thefirst-orderconditionsforminimizationofthisweightedsumofsquaresare∂S(β)=−X0Z(ZZ)−1Zε(β)=0.∂βThisresultisthesameonewehadforthelinearmodelwithX0intheroleofX.Youshouldcheckthatwhenε(β)=y−Xβ,ourresultsforthelinearmodelinSection9.5.1arereplicatedexactly.Thisproblem,however,ishighlynonlinearinmostcases,andtherepeatedleastsquaresapproachisunlikelytobeeffective.ButitisastraightforwardminimizationproblemintheframeworksofAppendixE,andinstead,wecanjusttreatestimationhereasaprobleminnonlinearoptimization.Wehaveapproachedtheformulationofthisinstrumentalvariablesestimatormoreorlessstrategically.However,thereisamorestructuredapproach.Theorthogonalityconditionplim(1/n)Zε=0definesaGMMestimator.Withthehomoscedasticityandnonautocorrelationassump-tion,theresultantminimumdistanceestimatorproducespreciselythecriterionfunctionsuggestedabove.Wewillrevisitthisestimatorinthiscontext,inChapter18.Withwell-behavedpseudoregressorsandinstrumentalvariables,wehavethegen-eralresultforthenonlinearinstrumentalvariablesestimator;thisresultisdiscussedatlengthinDavidsonandMacKinnon(1993).\nGreene-50240bookJune11,200219:33CHAPTER9✦NonlinearRegressionModels183THEOREM9.3AsymptoticDistributionoftheNonlinearInstrumentalVariablesEstimatorWithwell-behavedinstrumentalvariablesandpseudoregressors,a20−10−1bIV∼Nβ,σQxz(Qzz)Qzx.WeestimatetheasymptoticcovariancematrixwithEst.Asy.Var[b]=σˆ2[Xˆ0Z(ZZ)−1ZXˆ0]−1,IVwhereXˆ0isX0computedusingb.IVAsafinalobservation,notethatthe“two-stageleastsquares”interpretationoftheinstrumentalvariablesestimatorforthelinearmodelstillapplieshere,withrespecttotheIVestimator.Thatis,atthefinalestimates,thefirst-orderconditions(normalequations)implythatX0Z(ZZ)−1Zy=X0Z(ZZ)−1ZX0β,whichsaysthattheestimatessatisfythenormalequationsforalinearregressionofy(noty0)onthepredictionsobtainedbyregressingthecolumnsofX0onZ.Theinterpretationisnotquitethesamehere,becausetocomputethepredictionsofX0,wemusthavetheestimateofβinhand.Thus,thistwo-stageleastsquaresapproachdoesnotshowhowtocomputebIV;itshowsacharacteristicofbIV.Example9.9InstrumentalVariablesEstimatesoftheConsumptionFunctionTheconsumptionfunctioninSection9.3.1wasestimatedbynonlinearleastsquareswithout0accountingforthenatureofthedatathatwouldcertainlyinducecorrelationbetweenXandε.Aswedidearlier,wewillreestimatethismodelusingthetechniqueofinstrumentalvariables.Forthisapplication,wewillusetheone-periodlaggedvalueofconsumptionandone-andtwo-periodlaggedvaluesofincomeasinstrumentalvariablesestimates.Table9.3reportsthenonlinearleastsquaresandinstrumentalvariablesestimates.Sinceweareusingtwoperiodsoflaggedvalues,twoobservationsarelost.Thus,theleastsquaresestimatesarenotthesameasthosereportedearlier.Theinstrumentalvariableestimatesdifferconsiderablyfromtheleastsquaresestimates.Thedifferencescanbedeceiving,however.RecallthattheMPCinthemodelisβYγ−1.The2000.4valueforDPIthatweexaminedearlierwas6634.9.Atthisvalue,theinstrumentalvariablesandleastsquaresestimatesoftheMPCare0.8567withanestimatedstandarderrorof0.01234and1.08479withanestimatedstandarderrorof0.008694,respectively.Thesevaluesdodifferabitbutlessthanthequitelargedifferencesintheparametersmighthaveledonetoexpect.Wedonotethatbothoftheseareconsiderablygreaterthantheestimateinthelinearmodel,0.9222(andgreaterthanone,whichseemsabitimplausible).9.5.2TWO-STEPNONLINEARLEASTSQUARESESTIMATIONInthissection,weconsideraspecialcaseofthisgeneralclassofmodelsinwhichthenonlinearregressionmodeldependsonasecondsetofparametersthatisestimatedseparately.\nGreene-50240bookJune11,200219:33184CHAPTER9✦NonlinearRegressionModelsTABLE9.3NonlinearLeastSquaresandInstrumentalVariableEstimatesInstrumentalVariablesLeastSquaresParameterEstimateStandardErrorEstimateStandardErrorα627.03126.6063468.21522.788β0.0402910.0060500.09715980.01064γ1.347380.0168161.248920.1220σ57.1681—49.87998—ee650,369.805—495,114.490—Themodelisy=h(x,β,w,γ)+ε.Weconsidercasesinwhichtheauxiliaryparameterγisestimatedseparatelyinamodelthatdependsonanadditionalsetofvariablesw.Thisfirststepmightbealeastsquaresregression,anonlinearregression,oramaximumlikelihoodestimation.Theparametersγwillusuallyenterh(.)throughsomefunctionofγandw,suchasanexpectation.Thesecondstepthenconsistsofanonlinearregressionofyonh(x,β,w,c)inwhichcisthefirst-roundestimateofγ.Toputthisincontext,wewilldevelopanexample.Theestimationprocedureisasfollows.1.Estimateγbyleastsquares,nonlinearleastsquares,ormaximumlikelihood.Weassumethatthisestimator,howeverobtained,denotedc,isconsistentandasymp-toticallynormallydistributedwithasymptoticcovariancematrixVc.LetVˆcbeanyappropriateestimatorofVc.2.Estimateβbynonlinearleastsquaresregressionofyonh(x,β,w,c).Letσ2Vbbetheasymptoticcovariancematrixofthisestimatorofβ,assumingγisknownandlets2Vˆbeanyappropriateestimatorofσ2V=σ2(X0X0)−1,whereX0bbisthematrixofpseudoregressorsevaluatedatthetrueparametervaluesx0=i∂h(xi,β,wi,γ)/∂β.TheargumentforconsistencyofbisbasedontheSlutskyTheorem,D.12aswetreatbasafunctionofcandthedata.Werequire,asusual,well-behavedpseudoregressors.Aslongascisconsistentforγ,thelarge-samplebehavioroftheestimatorofβconditionedoncisthesameasthatconditionedonγ,thatis,asifγwereknown.Asymptoticnormalityisobtainedalongsimilarlines(albeitwithgreaterdifficulty).Theasymptoticcovariancematrixforthetwo-stepestimatorisprovidedbythefollowingtheorem.THEOREM9.4AsymptoticDistributionoftheTwo-StepNonlinearLeastSquaresEstimator[MurphyandTopel(1985)]Underthestandardconditionsassumedforthenonlinearleastsquaresestima-tor,thesecond-stepestimatorofβisconsistentandasymptoticallynormallydis-tributedwithasymptoticcovariancematrixV∗=σ2V+V[CVC−CVR−RVC]V,bbbcccb\nGreene-50240bookJune11,200219:33CHAPTER9✦NonlinearRegressionModels185THEOREM9.4(Continued)wheren102∂h(xi,β,wi,γ)C=nplimxiεˆin∂γi=1andn10∂g(wi,γ)R=nplimxiεˆi.n∂γi=1Thefunction∂g(.)/∂γinthedefinitionofRisthegradientoftheithterminthelog-likelihoodfunctionifγisestimatedbymaximumlikelihood.(Thepreciseformisshownbelow.)Ifγappearsastheparametervectorinaregressionmodel,zi=f(wi,γ)+ui,(9-32)then∂g(.)/∂γwillbeaderivativeofthesumofsquareddeviationsfunction,∂g(.)∂f(wi,γ)=ui.∂γ∂γIfthisisalinearregression,thenthederivativevectorisjustwi.Implementationofthetheoremrequiresthattheasymptoticcovariancematrixcomputedasusualforthesecond-stepestimatorbasedoncinsteadofthetrueγmustbecorrectedforthepresenceoftheestimatorcinb.Beforedevelopingtheapplication,wenotehowsomeimportantspecialcasesarehandled.Ifγentersh(.)asthecoefficientvectorinapredictionofanothervariableinaregressionmodel,thenwehavethefollowingusefulresults.Case1Linearregressionmodels.Ifh(.)=xβ+δE[z|w]+ε,whereE[z|w]=iiiiiiwγ,thenthetwomodelsarejustfitbylinearleastsquaresasusual.Theregressioniforyincludesanadditionalvariable,wc.Letdbethecoefficientonthisnewvariable.iThennCˆ=de2xwiiii=1andnRˆ=(eu)xw.iiiii=1Case2Uncorrelatedlinearregressionmodels.InCase1,ifthetworegressiondistur-bancesareuncorrelated,thenR=0.Case2isgeneral.ThetermsinRvanishasymptoticallyiftheregressionshaveuncorrelateddisturbances,whethereitherorbothofthemarelinear.Thissituationwillbequitecommon.\nGreene-50240bookJune11,200219:33186CHAPTER9✦NonlinearRegressionModelsCase3Predictionfromanonlinearmodel.InCases1and2,ifE[zi|wi]isanonlinearfunctionratherthanalinearfunction,thenitisonlynecessarytochangewtow0=ii∂E[zi|wi]/∂γ—avectorofpseudoregressors—inthedefinitionsofCandR.Case4Subsetofregressors.Incase2(butnotincase1),ifwcontainsallthevariablesthatareinx,thentheappropriateestimatorissimplyc2s2V∗=s21+u(X∗X∗)−1,bes2ewhereX∗includesallthevariablesinxaswellasthepredictionforz.Allthesecasescarryovertothecaseofanonlinearregressionfunctionfory.Itisonlynecessarytoreplacex,theactualregressorsinthelinearmodel,withx0,theiipseudoregressors.9.5.3TWO-STEPESTIMATIONOFACREDITSCORINGMODELGreene(1995c)estimatesamodelofconsumerbehaviorinwhichthedependentvari-ableofinterestisthenumberofmajorderogatoryreportsrecordedinthecredithistoryofasampleofapplicantsforatypeofcreditcard.Infact,thisparticularvariableisoneofthemostsignificantdeterminantsofwhetheranapplicationforaloanoracreditcardwillbeaccepted.Thisdependentvariableyisadiscretevariablethatatanytime,formostconsumers,willequalzero,butforasignificantfractionwhohavemissedseveralrevolvingcreditpayments,itwilltakeapositivevalue.Thetypicalvaluesarezero,one,ortwo,butvaluesupto,say,10arenotunusual.ThiscountvariableismodeledusingaPoissonregressionmodel.ThismodelappearsinSectionsB.4.8,22.2.1,22.3.7,and21.9.Theprobabilitydensityfunctionforthisdiscreterandomvariableise−λiλjiProb[yi=j]=.j!Theexpectedvalueofyiisλi,sodependingonhowλiisspecifiedanddespitetheunusualnatureofthedependentvariable,thismodelisalinearornonlinearregressionmodel.Wewillconsiderbothcases,thelinearmodelE[y|x]=xβandthemorecommoniiixβloglinearmodelE[yi|xi]=ei,whereximightincludesuchcovariatesasage,income,andtypicalmonthlycreditaccountexpenditure.Thismodelisusuallyestimatedbymaximumlikelihood.Butsinceitisabonafideregressionmodel,leastsquares,eitherlinearornonlinear,isaconsistent,ifinefficient,estimator.InGreene’sstudy,asecondarymodelisfitfortheoutcomeofthecreditcardapplication.Letzidenotethisoutcome,coded1iftheapplicationisaccepted,0ifnot.Forpurposesofthisexample,wewillmodelthisoutcomeusingalogitmodel(seetheextensivedevelopmentinChapter21,esp.Section21.3).ThuswγeiProb[zi=1]=P(wi,γ)=,wγ1+eiwherewimightincludeage,income,whethertheapplicantsowntheirownhomes,andwhethertheyareself-employed;thesearethesortsofvariablesthat“creditscoring”agenciesexamine.\nGreene-50240bookJune11,200219:33CHAPTER9✦NonlinearRegressionModels187Finally,wesupposethattheprobabilityofacceptanceenterstheregressionmodelasanadditionalexplanatoryvariable.(Weconcedethatthepoweroftheunderlyingtheorywanesabithere.)Thus,ournonlinearregressionmodelisE[y|x]=xβ+δP(w,γ)(linear)iiiiorxβ+δP(w,γ)E[y|x]=eii(loglinear,nonlinear).iiThetwo-stepestimationprocedureconsistsofestimationofγbymaximumlikelihood,thencomputingPˆi=P(wi,c),andfinallyestimatingbyeitherlinearornonlinearleastsquares[β,δ]usingPˆiasaconstructedregressor.Wewilldevelopthetheoreticalbackgroundfortheestimatorandthencontinuewithimplementationoftheestimator.ForthePoissonregressionmodel,whentheconditionalmeanfunctionislinear,x0=x.Ifitisloglinear,theniix0=∂λ/∂β=∂exp(xβ)/∂β=λx,iiiiiwhichissimpletocompute.WhenP(wi,γ)isincludedinthemodel,thepseudoregressorvectorx0includesthisvariableandthecoefficientvectoris[β,δ].Theni1nVˆ=[y−h(x,w,b,c)]2×(X0X0)−1,biiini=1whereX0iscomputedat[b,d,c],thefinalestimates.Forthelogitmodel,thegradientofthelog-likelihoodandtheestimatorofVcaregiveninSection21.3.1.Theyare∂lnf(zi|wi,γ)/∂γ=[zi−P(wi,γ)]wiand−1nVˆ=[z−P(w,γˆ)]2ww.ciiiii=1Notethatforthismodel,weareactuallyinsertingapredictionfromaregressionmodelofsorts,sinceE[zi|wi]=P(wi,γ).TocomputeC,wewillrequire∂h(.)/∂γ=λiδ∂Pi/∂γ=λiδPi(1−Pi)wi.TheremainingpartsofthecorrectedcovariancematrixarecomputedusingnCˆ=λˆxˆ0εˆ2[λˆdPˆ(1−Pˆ)]wiiiiiiii=1andnRˆ=λˆxˆ0εˆ(z−Pˆ)w.iiiiiii=1(Iftheregressionmodelislinear,thenthethreeoccurrencesofλiareomitted.)\nGreene-50240bookJune11,200219:33188CHAPTER9✦NonlinearRegressionModelsTABLE9.4Two-StepEstimatesofaCreditScoringModelxβ+δPStep1.P(w,γ)Step2.E[y|x]=xβ+δPStep2.E[y|x]=eiiiiiiiiiVariableEst.St.Er.Est.St.Er.*St.Er.*Est.St.Er.Se.Er.*Constant2.72361.0970−1.06281.19071.2681−7.19696.270849.3854Age−0.73280.029610.0216610.0187560.0200890.0799840.081350.61183Income0.219190.142960.034730.072660.082079−0.13280070.213801.8687Self-empl−1.94391.01270OwnRent0.189370.49817Expend−0.0007870.0003680.000413−0.280080.964290.96969P(wi,γ)1.04081.06531.1772996.990985.797849.34414lnL−53.925ee95.550680.31265s0.9774960.89617R20.054330.20514Mean0.730.360.36DatausedintheapplicationarelistedinAppendixTableF9.1.Weusethefollowingmodel:Prob[zi=1]=P(age,income,ownrent,self-employed),E[yi]=h(age,income,expend).Wehaveused100ofthe1,319observationsusedintheoriginalstudy.Table9.4reportstheresultsofthevariousregressionsandcomputations.ThecolumndenotedSt.Er.*containsthecorrectedstandarderror.ThecolumnmarkedSt.Er.containsthestandarderrorsthatwouldbecomputedignoringthetwo-stepnatureofthecomputations.Forthelinearmodel,weusedee/ntoestimateσ2.Asexpected,accountingforthevariabilityincincreasesthestandarderrorsofthesecond-stepestimator.Thelinearmodelappearstogivequitedifferentresultsfromthenonlinearmodel.Butthiscanbedeceiving.Inthelinearmodel,∂E[yi|xi,Pi]/∂xi=βwhereasinthenonlinearmodel,thecounterpartisnotβbutλiβ.Thevalueofλiatthemeanvaluesofallthevariablesinthesecond-stepmodelisroughly0.36(themeanofthedependentvariable),sothemarginaleffectsinthenonlinearmodelare[0.0224,−0.0372,−0.07847,1.9587],respectively,includingPibutnottheconstant,whicharereasonablysimilartothoseforthelinearmodel.Tocomputeanasymptoticcovariancematrixfortheestimatedmarginaleffects,wewouldusethedeltamethodfromSectionsD.2.7andD.3.1.Forconvenience,letb=[b,d],andletv=[x,Pˆ],piiiwhichjustaddsPitotheregressorvectorsoweneednottreatitseparately.Thenthevectorofmarginaleffectsism=exp(vb)×b=λb.ippipThematrixofderivativesisG=∂m/∂b=λ(I+bv),pipisotheestimatoroftheasymptoticcovariancematrixformis∗Est.Asy.Var[m]=GVbG.\nGreene-50240bookJune11,200219:33CHAPTER9✦NonlinearRegressionModels189TABLE9.5MaximumLikelihoodEstimatesofSecond-StepRegressionModelConstantAgeIncomeExpendPEstimate−6.32000.0731060.045236−0.006894.6324Std.Error3.93080.0542460.174110.002023.6618Corr.Std.Error9.03210.1028670.4023680.0039859.918233Onemightbetemptedtotreatλiasaconstant,inwhichcaseonlythefirstterminthequadraticformwouldappearandthecomputationwouldamountsimplytomul-tiplyingtheasymptoticstandarderrorsforbpbyλi.Thisapproximationwouldleavetheasymptotictratiosunchanged,whereasmakingthefullcorrectionwillchangetheentirecovariancematrix.Theapproximationwillgenerallyleadtoanunderstatementofthecorrectstandarderrors.Finally,althoughthistreatmentisnotdiscussedindetailuntilChapter18,wenoteatthispointthatnonlinearleastsquaresisaninefficientestimatorinthePoissonregressionmodel;maximumlikelihoodisthepreferred,efficientestimator.Table9.5presentsthemaximumlikelihoodestimateswithbothcorrectedanduncorrectedestimatesoftheasymptoticstandarderrorsoftheparameterestimates.(ThefulldiscussionofthemodelisgiveninSection21.9.)ThecorrectedstandarderrorsarecomputedusingthemethodsshowninSection17.7.AcomparisonoftheseestimateswiththoseinthethirdsetofTable9.4suggeststheclearsuperiorityofthemaximumlikelihoodestimator.9.6SUMMARYANDCONCLUSIONSInthischapter,weextendedtheregressionmodeltoaformwhichallowsnonlinearityintheparametersintheregressionfunction.Theresultsforinterpretation,estimation,andhypothesistestingarequitesimilartothoseforthelinearmodel.Thetwocrucialdifferencesbetweenthetwomodelsare,first,themoreinvolvedestimationproceduresneededforthenonlinearmodeland,second,theambiguityoftheinterpretationofthecoefficientsinthenonlinearmodel(sincethederivativesoftheregressionareoftennonconstant,incontrasttothoseinthelinearmodel.)Finally,weaddedtwoadditionallevelsofgeneralitytothemodel.Anonlinearinstrumentalvariablesestimatorissug-gestedtoaccommodatethepossibilitythatthedisturbancesinthemodelarecorrelatedwiththeincludedvariables.Inthesecondapplication,two-stepnonlinearleastsquaresissuggestedasamethodofallowingamodeltobefitwhileincludingfunctionsofpreviouslyestimatedparameters.KeyTermsandConcepts•Box–Coxtransformation•Linearizedregressionmodel•PEtest•Consistency•LMtest•Pseudoregressors•Deltamethod•Logit•Semiparametric•GMMestimator•Multicollinearity•Startingvalues•Identification•Nonlinearmodel•Translog•Instrumentalvariables•Normalization•Two-stepestimationestimator•Orthogonalitycondition•Waldtest•Iteration•Overidentifyingrestrictions\nGreene-50240bookJune11,200219:33190CHAPTER9✦NonlinearRegressionModelsExercises1.Describehowtoobtainnonlinearleastsquaresestimatesoftheparametersofthemodely=αxβ+ε.2.UseMacKinnon,White,andDavidson’sPEtesttodeterminewhetheralinearorloglinearproductionmodelismoreappropriateforthedatainAppendixTableF6.1.(ThetestisdescribedinSection9.4.3andExample9.8.)3.UsingtheBox–Coxtransformation,wemayspecifyanalternativetotheCobb–Douglasmodelas(Kλ−1)(Lλ−1)lnY=α+βk+βl+ε.λλUsingZellnerandRevankar’sdatainAppendixTableF9.2,estimateα,βk,βl,andλbyusingthescanningmethodsuggestedinSection9.3.2.(DonotforgettoscaleY,K,andLbythenumberofestablishments.)Use(9-16),(9-12),and(9-13)tocomputetheappropriateasymptoticstandarderrorsforyourestimates.Computethetwooutputelasticities,∂lnY/∂lnKand∂lnY/∂lnL,atthesamplemeansofKandL.[Hint:∂lnY/∂lnK=K∂lnY/∂K.]4.ForthemodelinExercise3,testthehypothesisthatλ=0usingaWaldtest,alikelihoodratiotest,andaLagrangemultipliertest.NotethattherestrictedmodelistheCobb–Douglaslog-linearmodel.5.ToextendZellnerandRevankar’smodelinafashionsimilartotheirs,wecanusetheBox–Coxtransformationforthedependentvariableaswell.UsethemethodofExample17.6(withθ=λ)torepeatthestudyoftheprecedingtwoexercises.Howdoyourresultschange?6.Verifythefollowingdifferentialequation,whichappliestotheBox–Coxtransfor-mation:dix(λ)1idi−1x(λ)=xλ(lnx)i−.(9-33)dλiλdλi−1Showthatthelimitingsequenceforλ=0isdix(λ)(lnx)i+1lim=.(9-34)λ→0dλii+1Theseresultscanbeusedtogreatadvantageinderivingtheactualsecondderiva-tivesofthelog-likelihoodfunctionfortheBox–Coxmodel.\nGreene-50240bookJune11,200218:5110NONSPHERICALDISTURBANCES—THEGENERALIZEDREGRESSIONMODELQ10.1INTRODUCTIONInChapter9,weextendedtheclassicallinearmodeltoallowtheconditionalmeantobeanonlinearfunction.1Butweretainedtheimportantassumptionsaboutthedisturbances:thattheyareuncorrelatedwitheachotherandthattheyhaveaconstantvariance,conditionedontheindependentvariables.Inthisandthenextseveralchapters,weextendthemultipleregressionmodeltodisturbancesthatviolatetheseclassicalassumptions.Thegeneralizedlinearregressionmodelisy=Xβ+ε,E[ε|X]=0,(10-1)E[εε|X]=σ2=,whereisapositivedefinitematrix.(Thecovariancematrixiswrittenintheformσ2atseveralpointssothatwecanobtaintheclassicalmodel,σ2I,asaconvenientspecialcase.)Aswewillexaminebrieflybelow,theextensionofthemodeltononlinearityisrelativelyminorincomparisonwiththevariantsconsideredhere.Forpresentpurposes,wewillretainthelinearspecificationandrefertoourmodelsimplyasthegeneralizedregressionmodel.Twocaseswewillconsiderindetailareheteroscedasticityandautocorrelation.Dis-turbancesareheteroscedasticwhentheyhavedifferentvariances.Heteroscedasticityusuallyarisesinvolatilehighfrequencytime-seriesdatasuchasdailyobservationsinfinancialmarketsandincross-sectiondatawherethescaleofthedependentvariableandtheexplanatorypowerofthemodeltendtovaryacrossobservations.Microeco-nomicdatasuchasexpendituresurveysaretypical.Thedisturbancesarestillassumedtobeuncorrelatedacrossobservations,soσ2wouldbeω0···0σ20···01110ω···00σ2···022222σ=σ.=......00···ωnn00···σ2n1Recallthatourdefinitionofnonlinearitypertainstotheestimationmethodrequiredtoobtaintheparameterestimates,nottothewaythattheyentertheregressionfunction.191\nGreene-50240bookJune11,200218:51192CHAPTER10✦NonsphericalDisturbances(Thefirstmentionedsituationinvolvingfinancialdataismorecomplexthanthis,andisexaminedindetailinSection11.8.)Autocorrelationisusuallyfoundintime-seriesdata.Economictimeseriesoftendisplaya“memory”inthatvariationaroundtheregressionfunctionisnotindependentfromoneperiodtothenext.Theseasonallyadjustedpriceandquantityseriespublishedbygovernmentagenciesareexamples.Time-seriesdataareusuallyhomoscedastic,soσ2mightbe1ρ1···ρn−1ρ11···ρn−2σ2=σ2....ρn−1ρn−2···1Thevaluesthatappearoffthediagonaldependonthemodelusedforthedisturbance.Inmostcases,consistentwiththenotionofafadingmemory,thevaluesdeclineaswemoveawayfromthediagonal.Paneldatasets,consistingofcrosssectionsobservedatseveralpointsintime,mayexhibitbothcharacteristics.WeshallconsidertheminChapter14.Thischapterpresentssomegeneralresultsforthisextendedmodel.Thenextseveralchaptersexamineindetailspecifictypesofgeneralizedregressionmodels.Ourearlierresultsfortheclassicalmodelwillhavetobemodified.Wewilltakethesameapproachinthischapterongeneralresultsandinthenexttwoonheteroscedas-ticityandserialcorrelation,respectively:1.Wefirstconsidertheconsequencesfortheleastsquaresestimatorofthemoregeneralformoftheregressionmodel.Thiswillincludeassessingtheeffectofignoringthecomplicationofthegeneralizedmodelandofdevisinganappropriateestimationstrategy,stillbasedonleastsquares.2.Insubsequentsections,wewillexaminealternativeestimationapproachesthatcanmakebetteruseofthecharacteristicsofthemodel.WebeginwithGMMestimation,whichisrobustandsemiparametric.Minimalassumptionsaboutaremadeatthispoint.3.Wethennarrowtheassumptionsandbegintolookformethodsofdetectingthefailureoftheclassicalmodel—thatis,weformulateproceduresfortestingthespecificationoftheclassicalmodelagainstthegeneralizedregression.4.Thefinalstepintheanalysisistoformulateparametricmodelsthatmakespecificassumptionsabout.Estimatorsinthissettingaresomeformofgeneralizedleastsquaresormaximumlikelihood.Themodelisexaminedingeneraltermsinthisandthenexttwochapters.Majorapplica-tionstopaneldataandmultipleequationsystemsareconsideredinChapters13and14.10.2LEASTSQUARESANDINSTRUMENTALVARIABLESESTIMATIONTheessentialresultsfortheclassicalmodelwithsphericaldisturbancesE[ε|X]=0\nGreene-50240bookJune11,200218:51CHAPTER10✦NonsphericalDisturbances193andE[εε|X]=σ2I(10-2)arepresentedinChapters2through8.Toreiterate,wefoundthattheordinaryleastsquares(OLS)estimatorb=(XX)−1Xy=β+(XX)−1Xε(10-3)isbestlinearunbiased(BLU),consistentandasymptoticallynormallydistributed(CAN),andifthedisturbancesarenormallydistributed,likeothermaximumlikelihoodestimatorsconsideredinChapter17,asymptoticallyefficientamongallCANestimators.Wenowconsiderwhichofthesepropertiescontinuetoholdinthemodelof(10-1).Tosummarize,theleastsquares,nonlinearleastsquares,andinstrumentalvariablesestimatorsretainonlysomeoftheirdesirablepropertiesinthismodel.Leastsquaresremainsunbiased,consistent,andasymptoticallynormallydistributed.Itwill,however,nolongerbeefficient—thisclaimremainstobeverified—andtheusualinferencepro-ceduresarenolongerappropriate.Nonlinearleastsquaresandinstrumentalvariableslikewiseremainconsistent,butonceagain,theextensionofthemodelbringsaboutsomechangesinourearlierresultsconcerningtheasymptoticdistributions.Wewillconsiderthesecasesindetail.10.2.1FINITE-SAMPLEPROPERTIESOFORDINARYLEASTSQUARESBytakingexpectationsonbothsidesof(10-3),wefindthatifE[ε|X]=0,thenE[b]=EX[E[b|X]]=β.(10-4)Therefore,wehavethefollowingtheorem.THEOREM10.1FiniteSamplePropertiesofbintheGeneralizedRegressionModelIftheregressorsanddisturbancesareuncorrelated,thentheunbiasednessofleastsquaresisunaffectedbyviolationsofassumption(10-2).Theleastsquaresestima-torisunbiasedinthegeneralizedregressionmodel.Withnonstochasticregressors,orconditionalonX,thesamplingvarianceoftheleastsquaresestimatorisVar[b|X]=E[(b−β)(b−β)|X]=E[(XX)−1XεεX(XX)−1|X]=(XX)−1X(σ2)X(XX)−1(10-5)2−1−1σ111=XXXXXX.nnnnIftheregressorsarestochastic,thentheunconditionalvarianceisEX[Var[b|X]].In(10-3),bisalinearfunctionofε.Therefore,ifεisnormallydistributed,thenb|X∼N[β,σ2(XX)−1(XX)(XX)−1].\nGreene-50240bookJune11,200218:51194CHAPTER10✦NonsphericalDisturbancesTheendresultisthatbhaspropertiesthataresimilartothoseintheclassicalregressioncase.Sincethevarianceoftheleastsquaresestimatorisnotσ2(XX)−1,however,statisticalinferencebasedons2(XX)−1maybemisleading.Notonlyisthisthewrongmatrixtobeused,buts2maybeabiasedestimatorofσ2.Thereisusuallynowaytoknowwhetherσ2(XX)−1islargerorsmallerthanthetruevarianceofb,soevenwithagoodestimateofσ2,theconventionalestimatorofVar[b]maynotbeparticularlyuseful.Finally,sincewehavedispensedwiththefundamentalunderlyingassumption,thefamiliarinferenceproceduresbasedontheFandtdistributionswillnolongerbeappropriate.Oneissuewewillexploreatseveralpointsbelowishowbadlyoneislikelytogoawryiftheresultin(10-5)isignoredandiftheuseofthefamiliarproceduresbasedons2(XX)−1iscontinued.10.2.2ASYMPTOTICPROPERTIESOFLEASTSQUARESIfVar[b|X]convergestozero,thenbismeansquareconsistent.Withwell-behavedregressors,(XX/n)−1willconvergetoaconstantmatrix.But(σ2/n)(XX/n)neednotconvergeatall.Bywritingthisproductas22nnσXXσi=1j=1ωijxixj=(10-6)nnnnweseethatthoughtheleadingconstantwill,byitself,convergetozero,thematrixisasumofn2terms,dividedbyn.Thus,theproductisascalarthatisO(1/n)timesamatrixthatis,atleastatthisjuncture,O(n),whichisO(1).So,itdoesappearatfirstblushthatiftheproductin(10-6)doesconverge,itmightconvergetoamatrixofnonzeroconstants.Inthiscase,thecovariancematrixoftheleastsquaresestimatorwouldnotconvergetozero,andconsistencywouldbedifficulttoestablish.Wewillexamineinsomedetail,theconditionsunderwhichthematrixin(10-6)convergestoaconstantmatrix.2Ifitdoes,thensinceσ2/ndoesvanish,ordinaryleastsquaresisconsistentaswellasunbiased.THEOREM10.2ConsistencyofOLSintheGeneralizedRegressionModelIfQ=plim(XX/n)andplim(XX/n)arebothfinitepositivedefinitematrices,thenbisconsistentforβ.Undertheassumedconditions,plimb=β.(10-7)TheconditionsinTheorem10.2dependonbothXand.Analternativeformula3thatseparatesthetwocomponentsisasfollows.Ordinaryleastsquaresisconsistentinthegeneralizedregressionmodelif:1.ThesmallestcharacteristicrootofXXincreaseswithoutboundasn→∞,whichimpliesthatplim(XX)−1=0.IftheregressorssatisfytheGrenanderconditionsG1throughG3ofSection5.2,thentheywillmeetthisrequirement.2Inorderfortheproductin(10-6)tovanish,itwouldbesufficientfor(XX/n)tobeO(nδ)whereδ<1.3Amemiya(1985,p.184).\nGreene-50240bookJune11,200218:51CHAPTER10✦NonsphericalDisturbances1952.Thelargestcharacteristicrootofisfiniteforalln.Fortheheteroscedasticmodel,thevariancesarethecharacteristicroots,whichrequiresthemtobefinite.Formodelswithautocorrelation,therequirementsarethattheelementsofbefiniteandthattheoff-diagonalelementsnotbetoolargerelativetothediagonalelements.Wewillexaminethisconditionatseveralpointsbelow.Theleastsquaresestimatorisasymptoticallynormallydistributedifthelimitingdistributionof−1√XX1n(b−β)=√Xε(10-8)nnisnormal.Ifplim(XX/n)=Q,thenthelimitingdistributionoftheright-handsideisthesameasthatof11nv=Q−1√Xε=Q−1√xε,(10-9)n,LSiinni=1wherexisarowofX(assuming,ofcourse,thatthelimitingdistributionexistsatall).iThequestionnowiswhetheracentrallimittheoremcanbeapplieddirectlytov.Ifthedisturbancesaremerelyheteroscedasticandstilluncorrelated,thentheanswerisgenerallyyes.Infact,wealreadyshowedthisresultinSection5.5.2whenweinvokedtheLindberg–Fellercentrallimittheorem(D.19)ortheLyapounovTheorem(D.20).Thetheoremsallowunequalvariancesinthesum.Theexactvarianceofthesumis1nσ2nExVar√xiεixi=ωiQi,nni=1i=1which,forourpurposes,wewouldrequiretoconvergetoapositivedefinitematrix.Inouranalysisoftheclassicalmodel,theheterogeneityofthevariancesarosebecauseoftheregressors,butwestillachievedthelimitingnormaldistributionin(5-7)through(5-14).Allthathaschangedhereisthatthevarianceofεvariesacrossobservationsaswell.Therefore,theproofofasymptoticnormalityinSection5.2.2isgeneralenoughtoincludethismodelwithoutmodification.AslongasXiswellbehavedandthediagonalelementsofarefiniteandwellbehaved,theleastsquaresestimatorisasymptoticallynormallydistributed,withthecovariancematrixgivenin(10-5).Thatis:Intheheteroscedasticcase,ifthevariancesofεiarefiniteandarenotdominatedbyanysingleterm,sothattheconditionsoftheLindberg–Fellercentrallimittheoremapplytovn,LSin(10-9),thentheleastsquaresestimatorisasymptoticallynormallydistributedwithcovariancematrixσ21Asy.Var[b]=Q−1plimXXQ−1.(10-10)nnForthemostgeneralcase,asymptoticnormalityismuchmoredifficulttoestablishbecausethesumsin(10-9)arenotnecessarilysumsofindependentorevenuncorrelatedrandomvariables.Nonetheless,Amemiya(1985,p.187)andAnderson(1971)haveshowntheasymptoticnormalityofbinamodelofautocorrelateddisturbancesgeneralenoughtoincludemostofthesettingswearelikelytomeetinpractice.Wewillrevisit\nGreene-50240bookJune11,200218:51196CHAPTER10✦NonsphericalDisturbancesthisissueinChapters19and20whenweexaminetimeseriesmodeling.Wecanconcludethat,exceptinparticularlyunfavorablecases,wehavethefollowingtheorem.THEOREM10.3AsymptoticDistributionofbintheGRModelIftheregressorsaresufficientlywellbehavedandtheoff-diagonaltermsindiminishsufficientlyrapidly,thentheleastsquaresestimatorisasymptoticallynormallydistributedwithmeanβandcovariancematrixgivenin(10-10).Therearetwocasesthatremaintobeconsidered,thenonlinearregressionmodelandtheinstrumentalvariablesestimator.10.2.3ASYMPTOTICPROPERTIESOFNONLINEARLEASTSQUARESIftheregressionfunctionisnonlinear,thentheanalysisofthissectionmustbeappliedtothepseudoregressorsx0ratherthantheindependentvariables.Asidefromthiscon-isideration,nonewresultsareneeded.Wecanjustapplythisdiscussiontothelinearizedregressionmodel.Undermostconditions,theresultslistedaboveapplytothenonlinearleastsquaresestimatoraswellasthelinearleastsquaresestimator.410.2.4ASYMPTOTICPROPERTIESOFTHEINSTRUMENTALVARIABLESESTIMATORThesecondestimatortobeconsideredistheinstrumentalvariablesestimatorthatweconsideredinSections5.4forthelinearmodeland9.5.1forthenonlinearmodel.Wewillconfineourattentiontothelinearmodel.Thenonlinearcasecanbeobtainedbyapplyingourresultstothelinearizedregression.Toreview,weconsideredcasesinwhichtheregressorsXarecorrelatedwiththedisturbancesε.Ifthisisthecase,asinthetime-seriesmodelsandtheerrorsinvariablesmodelsthatweexaminedearlier,thenbisneitherunbiasednorconsistent.5Intheclassicalmodel,weconstructedanestimatoraroundasetofvariablesZthatwereuncorrelatedwithε,b=[XZ(ZZ)−1ZX]−1XZ(ZZ)−1ZyIV(10-11)=β+[XZ(ZZ)−1ZX]−1XZ(ZZ)−1Zε.SupposethatXandZarewellbehavedinthesensediscussedinSection5.4.Thatis,plim(1/n)ZZ=Q,apositivedefinitematrix,ZZplim(1/n)ZX=Q=Q,anonzeromatrix,ZXXZplim(1/n)XX=Q,apositivedefinitematrix.XX4DavidsonandMacKinnon(1993)considerthiscaseatlength.5Itmaybeasymptoticallynormallydistributed,butaroundameanthatdiffersfromβ.\nGreene-50240bookJune11,200218:51CHAPTER10✦NonsphericalDisturbances197Toavoidastringofmatrixcomputationsthatmaynotfitonasingleline,forconveniencelet−1−1−1QXX.Z=QXZQZZQZXQXZQZZ−1−1−111111=plimXZZZZXXZZZ.nnnnnIfZisavalidsetofinstrumentalvariables,thatis,ifthesecondtermin(10-11)vanishesasymptotically,then1plimb=β+QplimZε=β.IVXX.ZnThisresultisexactlythesameonewehadbefore.Wemightnotethatattheseveralpointswherewehaveestablishedunbiasednessorconsistencyoftheleastsquaresorinstrumentalvariablesestimator,thecovariancematrixofthedisturbancevectorhasplayednorole;unbiasednessisapropertyofthemeans.Assuch,thisresultshouldcomeasnosurprise.ThelargesamplebehaviorofbIVdependsonthebehaviorof1nvn,IV=√ziεi.ni=1ThisresultisexactlytheoneweanalyzedinSection5.4.Ifthesamplingdistributionofvnconvergestoanormaldistribution,thenwewillbeabletoconstructtheasymptoticdistributionforbIV.ThissetofconditionsisthesamethatwasnecessaryforXwhenweconsideredbabove,withZinplaceofX.WewillonceagainrelyontheresultsofAnderson(1971)orAmemiya(1985)thatunderverygeneralconditions,n1d21√ziεi−→N0,σplimZZ.nni=1Withtheotherresultsalreadyinhand,wenowhavethefollowing.THEOREM10.4AsymptoticDistributionoftheIVEstimatorintheGeneralizedRegressionModelIftheregressorsandtheinstrumentalvariablesarewellbehavedinthefashionsdiscussedabove,thenabIV∼N[β,VIV],where(10-12)σ21V=(Q)plimZZ(Q).IVXX.ZXX.Znn\nGreene-50240bookJune11,200218:51198CHAPTER10✦NonsphericalDisturbances10.3ROBUSTESTIMATIONOFASYMPTOTICCOVARIANCEMATRICESThereisaremainingquestionregardingallthepreceding.Inviewof(10-5),isitneces-sarytodiscardordinaryleastsquaresasanestimator?Certainlyifisknown,then,asshowninSection10.5,thereisasimpleandefficientestimatoravailablebasedonit,andtheanswerisyes.Ifisunknownbutitsstructureisknownandwecanestimateusingsampleinformation,thentheanswerislessclear-cut.Inmanycases,basingestimationofβonsomealternativeprocedurethatusesanˆwillbepreferabletoordinaryleastsquares.ThissubjectiscoveredinChapters11to14.Thethirdpossibilityisthatiscompletelyunknown,bothastoitsstructureandthespecificvaluesofitselements.Inthissituation,leastsquaresorinstrumentalvariablesmaybetheonlyestimatoravail-able,andassuch,theonlyavailablestrategyistotrytodeviseanestimatorfortheappropriateasymptoticcovariancematrixofb.Ifσ2wereknown,thentheestimatoroftheasymptoticcovariancematrixofbin(10-10)wouldbe−1−11111V=XXX[σ2]XXX.OLSnnnnForthenonlinearleastsquaresestimator,wereplaceXwithX0.Fortheinstrumen-talvariablesestimator,theleft-andright-sidematricesarereplacedwiththissampleestimatesofQanditstranspose(usingX0againforthenonlinearinstrumentalvari-XX.Zablesestimator),andZreplacesXinthecentermatrix.Inallthesecases,thematricesofsumsofsquaresandcrossproductsintheleftandrightmatricesaresampledatathatarereadilyestimable,andtheproblemisthecentermatrixthatinvolvestheunknownσ2.Forestimationpurposes,notethatσ2isnotaseparateunknownparameter.Sinceisanunknownmatrix,itcanbescaledarbitrarily,saybyκ,andwithσ2scaledby1/κ,thesameproductremains.Inourapplications,wewillremovetheindeterminacybyassumingthattr()=n,asitiswhenσ2=σ2Iintheclassicalmodel.Fornow,justlet=σ2.Itmightseemthattoestimate(1/n)XX,anestimatorof,whichcontainsn(n+1)/2unknownparameters,isrequired.Butfortunately(sincewithnobservations,thismethodisgoingtobehopeless),thisobservationisnotquiteright.WhatisrequiredisanestimatoroftheK(K+1)/2unknownelementsinthematrix1nnplimQ=plimσxx.∗ijijni=1j=1ThepointisthatQ∗isamatrixofsumsofsquaresandcrossproductsthatinvolvesσijandtherowsofX(orZorX0).Theleastsquaresestimatorbisaconsistentestimatorofβ,whichimpliesthattheleastsquaresresidualseiare“pointwise”consistentesti-matorsoftheirpopulationcounterpartsεi.Thegeneralapproach,then,willbetouseXandetodeviseanestimatorofQ∗.Considertheheteroscedasticitycasefirst.Weseekanestimatorof1nQ=σ2xx.∗iiini=1\nGreene-50240bookJune11,200218:51CHAPTER10✦NonsphericalDisturbances199White(1980)hasshownthatunderverygeneralconditions,theestimator1nS=e2xx(10-13)0iiini=1hasplimS=plimQ.60∗WecansketchaproofofthisresultusingtheresultsweobtainedinSection5.2.7NotefirstthatQ∗isnotaparametermatrixinitself.ItisaweightedsumoftheouterproductsoftherowsofX(orZfortheinstrumentalvariablescase).Thus,weseeknotto“estimate”Q∗,buttofindafunctionofthesampledatathatwillbearbitrarilyclosetothisfunctionofthepopulationparametersasthesamplesizegrowslarge.Thedistinctionisimportant.Wearenotestimatingthemiddlematrixin(10-10)or(10-12);weareattemptingtoconstructamatrixfromthesampledatathatwillbehavethesamewaythatthismatrixbehaves.Inessence,ifQ∗convergestoafinitepositivematrix,thenwewouldbelookingforafunctionofthesampledatathatconvergestothesamematrix.Supposethatthetruedisturbancesεicouldbeobserved.TheneachterminQ∗wouldequalE[ε2xx|x].Withsomefairlymildassumptionsaboutx,then,wecouldiiiiiinvokealawoflargenumbers(seeTheoremsD.2throughD.4.)tostatethatifQ∗hasaprobabilitylimit,then1n1nplim=σ2xx=plimε2xx.iiiiiinni=1i=1ThefinaldetailistojustifythereplacementofεiwitheiinS0.Theconsistencyofbforβissufficientfortheargument.(Actually,residualsbasedonanyconsistentestimatorofβwouldsufficeforthisestimator,butasofnow,borbIVistheonlyoneinhand.)TheendresultisthattheWhiteheteroscedasticityconsistentestimator−1n−11111Est.Asy.Var[b]=XXe2xxXXiiinnnn(10-14)i=1=n(XX)−1S(XX)−10canbeusedtoestimatetheasymptoticcovariancematrixofb.Thisresultisextremelyimportantanduseful.8Itimpliesthatwithoutactuallyspec-ifyingthetypeofheteroscedasticity,wecanstillmakeappropriateinferencesbasedontheresultsofleastsquares.Thisimplicationisespeciallyusefulifweareunsureoftheprecisenatureoftheheteroscedasticity(whichisprobablymostofthetime).WewillpursuesomeexamplesinChapter11.6SeealsoEicker(1967),Horn,Horn,andDuncan(1975),andMacKinnonandWhite(1985).7Wewillgiveonlyabroadsketchoftheproof.FormalresultsappearinWhite(1980)and(2001).8FurtherdiscussionandsomerefinementsmaybefoundinCragg(1982).CraggshowshowWhite’sobserva-tioncanbeextendedtodeviseanestimatorthatimprovesontheefficiencyofordinaryleastsquares.\nGreene-50240bookJune11,200218:51200CHAPTER10✦NonsphericalDisturbancesTheextensionofWhite’sresulttothemoregeneralcaseofautocorrelationismuchmoredifficult.Thenaturalcounterpartforestimating1nnQ=σxx∗ijijni=1j=1wouldbe(10-15)1nnQˆ=eexx.∗ijijni=1j=1Buttherearetwoproblemswiththisestimator,onetheoretical,whichappliestoQ∗aswell,andonepractical,whichisspecifictothelatter.Unliketheheteroscedasticitycase,thematrixin(10-15)is1/ntimesasumofn2terms,soitisdifficulttoconcludeyetthatitwillconvergetoanythingatall.Thisapplicationismostlikelytoariseinatime-seriessetting.Toobtainconvergence,itisnecessarytoassumethatthetermsinvolvingunequalsubscriptsin(10-15)diminishinimportanceasngrows.Asufficientconditionisthattermswithsubscriptpairs|i−j|growsmallerasthedistancebetweenthemgrowslarger.Inpracticalterms,observationpairsareprogressivelylesscorrelatedastheirseparationintimegrows.Intuitively,ifonecanthinkofweightswiththediagonalelementsgettingaweightof1.0,theninthesum,theweightsinthesumgrowsmalleraswemoveawayfromthediagonal.Ifwethinkofthesumoftheweightsratherthanjustthenumberofterms,thenthissumfallsoffsufficientlyrapidlythatasngrowslarge,thesumisofordernratherthann2.Thus,weachieveconvergenceofQ∗byassumingthattherowsofXarewellbehavedandthatthecorrelationsdiminishwithincreasingseparationintime.(SeeSections5.3,12.5,and20.5foramoreformalstatementofthiscondition.)ThepracticalproblemisthatQˆ∗neednotbepositivedefinite.NeweyandWest(1987a)havedevisedanestimatorthatovercomesthisdifficulty:1LnQˆ=S+wee(xx+xx),∗0ltt−ltt−lt−ltnl=1t=l+1(10-16)lwl=1−.(L+1)TheNewey–Westautocorrelationconsistentcovarianceestimatorissurprisinglysimpleandrelativelyeasytoimplement.9Thereisafinalproblemtobesolved.ItmustbedeterminedinadvancehowlargeListobe.WewillexaminesomespecialcasesinChapter12,butingeneral,thereislittletheoreticalguidance.CurrentpracticespecifiesL≈T1/4.Unfortunately,theresultisnotquiteascrispasthatfortheheteroscedasticityconsistentestimator.WehavetheresultthatbandbIVareasymptoticallynormallydistributed,andwehaveanappropriateestimatorfortheasymptoticcovariancematrix.Wehavenotspecifiedthedistributionofthedisturbances,however.Thus,forinferencepurposes,theFstatisticisapproximateatbest.Moreover,formoreinvolvedhypotheses,thelikelihoodratioandLagrangemultipliertestsareunavailable.ThatleavestheWald9Bothestimatorsarenowstandardfeaturesinmoderneconometricscomputerprograms.FurtherresultsondifferentweightingschemesmaybefoundinHayashi(2000,pp.406–410).\nGreene-50240bookJune11,200218:51CHAPTER10✦NonsphericalDisturbances201statistic,includingasymptotic“tratios,”asthemaintoolforstatisticalinference.Wewillexamineanumberofapplicationsinthechapterstofollow.TheWhiteandNewey–Westestimatorsarestandardintheeconometricsliterature.Wewillencounterthematmanypointsinthediscussiontofollow.10.4GENERALIZEDMETHODOFMOMENTSESTIMATIONWewillanalyzethisestimationtechniqueinsomedetailinChapter18,sowewillonlysketchtheimportantresultshere.Itisusefultoconsidertheinstrumentalvariablescase,asitisfairlygeneralandwecaneasilyspecializeittothesimplerregressionmodelifthatisappropriate.Thus,wedepartfromthemodelspecificationin(10-1),butatthispoint,wenolongerrequirethatE[εi|xi]=0.Instead,weadopttheinstrumentalvariablesformulationinSection10.2.4.Thatis,ourmodelisy=xβ+εiiiE[εi|zi]=0forKvariablesinxiandforsomesetofLinstrumentalvariables,zi,whereL≥K.Theearliercaseofthegeneralizedregressionmodelarisesifzi=xi,andtheclassicalregressionformresultsifweadd=Iaswell,sothisisaconvenientencompassingmodelframework.Inthenextsectionongeneralizedleastsquaresestimation,wewillconsidertwocases,firstwithaknown,thenwithanunknownthatmustbeestimated.Inesti-mationbythegeneralizedmethodofmomentsneitheroftheseapproachesisrelevantbecausewebeginwithmuchless(assumed)knowledgeaboutthedatageneratingpro-cess.Inparticular,wewillconsiderthreecases:•Classicalregression:Var[εi|X,Z]=σ2,•Heteroscedasticity:Var[εi|X,Z]=σ2,i•Generalizedmodel:Cov[εt,εs|X,Z]=σ2ωts,whereZandXarethen×Landn×Kobserveddatamatrices.(Weassume,aswilloftenbetrue,thatthefullygeneralcasewillapplyinatimeseriessetting.Hencethechangeinthesubscripts.)Nospecificdistributionisassumedforthedisturbances,conditionalorunconditional.TheassumptionE[εi|zi]=0impliesthefollowingorthogonalitycondition:Cov[z,ε,]=0,orE[z(y−xβ)]=0.iiiiiBysummingtheterms,wefindthatthisfurtherimpliesthepopulationmomentequation,1nEz(y−xβ)=E[m¯(β)]=0.(10-17)iiini=1Thisrelationshipsuggestshowwemightnowproceedtoestimateβ.Note,infact,thatifzi=xi,thenthisisjustthepopulationcounterparttotheleastsquaresnormalequations.\nGreene-50240bookJune11,200218:51202CHAPTER10✦NonsphericalDisturbancesSo,asaguidetoestimation,thiswouldreturnustoleastsquares.Suppose,wenowtranslatethispopulationexpectationintoasampleanalog,andusethatasourguideforestimation.Thatis,ifthepopulationrelationshipholdsforthetrueparametervector,β,supposeweattempttomimicthisresultwithasamplecounterpart,orempiricalmomentequation,1n1nz(y−xβˆ)=m(βˆ)=m¯(βˆ)=0.(10-18)iiiinni=1i=1Intheabsenceofotherinformationaboutthedatageneratingprocess,wecanusetheempiricalmomentequationasthebasisofourestimationstrategy.TheempiricalmomentconditionisLequations(thenumberofvariablesinZ)inKunknowns(thenumberofparametersweseektoestimate).Therearethreepossibilitiestoconsider:1.Underidentified:LK,thenthereisnouniquesolutiontotheequationsystemm¯(βˆ)=0.Inthisinstance,weneedtoformulatesomestrategytochooseanestimator.Oneintuitivelyappealingpossibilitywhichhasservedwellthusfaris“leastsquares.”Inthisinstance,thatwouldmeanchoosingtheestimatorbasedonthecriterionfunctionMinq=m¯(βˆ)m¯(βˆ).βWedokeepinmind,thatwewillonlybeabletominimizethisatsomepositivevalue;thereisnoexactsolutionto(10-18)intheoveridentifiedcase.Also,youcanverifythatifwetreattheexactlyidentifiedcaseasifitwereoveridentified,thatis,useleastsquaresanyway,wewillstillobtaintheIVestimatorshownin(10-20)forthesolutiontocase(2).Fortheoveridentifiedcase,thefirstorderconditionsare∂q∂m¯(βˆ)=2m¯(βˆ)=2G¯(βˆ)m¯(βˆ)∂β∂β(10-21)111=2XZZy−ZXβˆ=0.nnnWeleaveasexercisetoshowthatthesolutioninbothcases(2)and(3)isnowβˆ=[(XZ)(ZX)]−1(XZ)(Zy).(10-22)\nGreene-50240bookJune11,200218:51CHAPTER10✦NonsphericalDisturbances203Theestimatorin(10-22)isahybridthatwehavenotencounteredbefore,thoughifL=K,thenitdoesreducetotheearlieronein(10–20).(Intheoveridentifiedcase,(10-22)isnotanIVestimator,itis,aswehavesought,amethodofmomentsestimator.)Itremainstoestablishconsistencyandtoobtaintheasymptoticdistributionandanasymptoticcovariancematrixfortheestimator.TheseareanalyzedindetailinChapter18.Ourpurposehereisonlytosketchtheformalresult,sowewillmerelyclaimtheintermediateresultsweneed:ASSUMPTIONGMM1.Convergenceofthemoments.Thepopulationmomentcon-vergesinprobabilitytoitspopulationcounterpart.Thatis,m¯(β)→0.Differentcircumstanceswillproducedifferentkindsofconvergence,butwewillrequireitinsomeform.Forthesimplestcases,suchasamodelofheteroscedasticity,thiswillbeconvergenceinmeansquare.Certaintimeseriesmodelsthatinvolvecor-relatedobservationswillnecessitatesomeotherformofconvergence.But,inanyofthecasesweconsider,wewillrequirethegeneralresult,plimm¯(β)=0.ASSUMPTIONGMM2.Identification.Theparametersareidentifiedintermsofthemomentequations.Identificationmeans,essentially,thatalargeenoughsamplewillcontainsufficientinformationforusactuallytoestimateβconsistentlyusingthesamplemoments.Therearetwoconditionswhichmustbemet—anordercondition,whichwehavealreadyassumed(L≥K),andarankcondition,whichstatesthatthemomentequationsarenotredundant.Therankconditionimpliestheordercondition,soweneedonlyformalizeit:IdentificationconditionforGMMEstimation:TheL×Kmatrix∂m¯1n∂mi(β)=E[G¯(β)]=plimG¯(β)=plim=plim∂βn∂βi=1musthave(full)rowrankequaltoL.10SincethisrequiresL≥K,thisimpliestheordercondition.Thisassumptionmeansthatthisderivativematrixconvergesinprobabilitytoitsexpectation.Notethatwehaveassumed,inaddition,thatthederivatives,likethemomentsthemselves,obeyalawoflargenumbers—theyconvergeinprobabilitytotheirexpectations.ASSUMPTIONGMM3.LimitingNormalDistributionfortheSampleMoments.Thepopulationmomentobeysacentrallimittheoremorsomesimilarvariant.Sincewearestudyingageneralizedregressionmodel,Lindberg–Levy(D.19.)willbetoonarrow—theobservationswillhavedifferentvariances.Lindberg–Feller(D.19.A)sufficesintheheteroscedasticitycase,butinthegeneralcase,wewillultimatelyrequiresomethingmoregeneral.ThesetheoremsarediscussedinSection12.4andinvokedinChapter18.10Strictlyspeaking,weonlyrequirethattherowrankbeatleastaslargeasK,sotherecouldberedundant,thatis,functionallydependent,moments,solongasthereareatleastKthatarefunctionallyindependent.Thecaseofrank()greaterthanorequaltoKbutlessthanLcanbeignored.\nGreene-50240bookJune11,200218:51204CHAPTER10✦NonsphericalDisturbancesItwillfollowfromtheseassumptions(again,atthispointwedothiswithoutproof)thattheGMMestimatorsthatweobtainare,infact,consistent.ByvirtueoftheSlutskytheorem,wecantransferourlimitingresultsabovetotheempiricalmomentequations.AproofofconsistencyoftheGMMestimator(pursuedinChapter18)willbebasedonthisresult.ToobtaintheasymptoticcovariancematrixwewillsimplyinvokearesultwewillobtainmoreformallyinChapter18forgeneralizedmethodofmomentsestimators.Thatis,1−1√−1Asy.Var[βˆ]=[]Asy.Var[nm¯(β)][].nFortheparticularmodelwearestudyinghere,m¯(β)=(1/n)(Zy−ZXβ),G¯(β)=(1/n)ZX,(β)=QZX(fromSection10.2.4).(Youshouldcheckintheprecedingexpressionthatthedimensionsoftheparticularmatricesandthedimensionsofthevariousproductsproducethecorrectlyconfiguredmatrixthatweseek.)Theremainingdetail,whichisthecrucialoneforthemodelweareexamining,isforustodetermine√V=Asy.Var[nm¯(β)].Giventheformofm¯(β),1n1nnZZV=Varzε=σ2ωzz=σ2iiijijnnni=1i=1j=1forthemostgeneralcase.Notethatthisispreciselytheexpressionthatappearsin(10-6),sothequestionthatarosethereariseshereonceagain.Thatis,underwhatconditionswillthisconvergetoaconstantmatrix?Wetakethediscussionthereasgiven.Theonlyremainingdetailishowtoestimatethismatrix.TheanswerappearsinSection10.3,wherewepursuedthissamequestioninconnectionwithrobustestimationoftheasymptoticcovariancematrixoftheleastsquaresestimator.Toreviewthen,whatwehaveachievedtothispointistoprovideatheoreticalfoundationfortheinstrumentalvariablesestimator.Asnotedearlier,thisspecializestotheleastsquaresestimator.TheestimatorsofVforourthreecaseswillbe•Classicalregression:(ee/n)n(ee/n)Vˆ=zz=ZZiinni=1•Heteroscedastic:1nVˆ=e2zz(10-23)iiini=1\nGreene-50240bookJune11,200218:51CHAPTER10✦NonsphericalDisturbances205•General:nLn1lVˆ=e2zz+1−ee(zz+zz).ttttt−ltt−lt−ltn(L+1)i=1l=1t=l+1Weshouldobserve,thatineachofthesecases,wehaveactuallyusedsomeinformationaboutthestructureof.Ifitisknownonlythatthetermsinm¯(β)areuncorrelated,thenthereisaconvenientestimatoravailable,1nVˆ=m(βˆ)m(βˆ)iini=1thatis,thenatural,empiricalvarianceestimator.Notethatthisiswhatisbeingusedintheheteroscedasticitycasedirectlyabove.Collectingallthetermssofar,then,wehave1Est.Asy.Var[βˆ]=[G¯(βˆ)G¯(βˆ)]−1G¯(βˆ)VˆG¯(βˆ)[G¯(βˆ)G¯(βˆ)]−1n(10-24)=n[(XZ)(ZX)]−1(XZ)Vˆ(ZX)[(XZ)(ZX)]−1.Theprecedingwouldseemtoendowtheleastsquaresormethodofmomentsesti-matorswithsomedegreeofoptimality,butthatisnotthecase.Wehaveonlyprovidedthemwithadifferentstatisticalmotivation(andestablishedconsistency).Wenowcon-siderthequestionofwhether,sincethisisthegeneralizedregressionmodel,thereissomebetter(moreefficient)meansofusingthedata.Asbefore,wemerelysketchtheresults.TheclassofminimumdistanceestimatorsisdefinedbythesolutionstothecriterionfunctionMinq=m¯(β)Wm¯(β),βwhereWisanypositivedefiniteweightingmatrix.Basedontheassumptionsmadeabove,wewillhavethefollowingtheorem,whichweclaimwithoutproofatthispoint:THEOREM10.5MinimumDistanceEstimatorsIfplimm¯(β)=0andifWisapositivedefinitematrix,thenplimβˆ=Argmin[q=m¯(β)Wm¯(β)]=β.Theminimumdistanceestimatorisconsistent.Itisalsoasymp-toticallynormallydistributedandhasasymptoticcovariancematrix1Asy.Var[βˆ]=[G¯WG¯]−1G¯WVWG¯[G¯WG¯]−1.MDnNotethatourentireprecedinganalysiswasofthesimplestminimumdistanceestimator,whichhasW=I.Theobviousquestionnowarises,ifanyWproducesaconsistentestimator,isanyWbetterthananyotherone,orisitsimplyarbitrary?Thereisafirmanswer,forwhichwehavetoconsidertwocasesseparately:•Exactlyidentifiedcase:IfL=K;thatis,ifthenumberofmomentconditionsisthesameasthenumberofparametersbeingestimated,thenWisirrelevanttothesolution,soonthebasisofsimplicityalone,theoptimalWisI.\nGreene-50240bookJune11,200218:51206CHAPTER10✦NonsphericalDisturbances•Overidentifiedcase:Inthiscase,the“optimal”weightingmatrix,thatis,theWwhichproducesthemostefficientestimatorisW=V−1.Thatis,thebestweightingmatrixistheinverseoftheasymptoticcovarianceofthemomentvector.THEOREM10.6GeneralizedMethodofMomentsEstimatorTheMinimumDistanceEstimatorobtainedbyusingW=V−1istheGeneralizedMethodofMoments,orGMMestimator.TheGMMestimatorisconsistent,asymptoticallynormallydistributed,andhasasymptoticcovariancematrixequalto1Asy.Var[βˆ]=[G¯V−1G¯]−1.GMMnForthegeneralizedregressionmodel,theseareβˆ=[(XZ)Vˆ−1(ZX)]−1(XZ)Vˆ−1(Zy)GMMandAsy.Var[βˆ]=[(XZ)Vˆ(ZX)]−1.GMMWeconcludethisdiscussionbytyingtogetherwhatshouldseemtobealooseend.TheGMMestimatoriscomputedasthesolutionto√−1Minβq=m¯(β)Asy.Var[nm¯(β)]m¯(β),whichsuggeststhattheweightingmatrixisafunctionofthethingwearetryingtoestimate.TheprocessofGMMestimationwillhavetoproceedintwosteps:Step1istoobtainanestimateofV,thenStep2willconsistofusingtheinverseofthisVastheweightingmatrixincomputingtheGMMestimator.WewillreturntothisinChapter18,sowenotedirectly,thefollowingisacommonstrategy:Step1.UseW=Itoobtainaconsistentestimatorofβ.Then,estimateVwithn1Vˆ=e2zziiini=1intheheteroscedasticitycase(i.e.,theWhiteestimator)or,forthemoregeneralcase,theNewey–Westestimatorin(10-23).Step2.UseW=Vˆ−1tocomputetheGMMestimator.Atthispoint,theobservantreadershouldhavenoticedthatinallofthepreceding,wehaveneveractuallyencounteredthesimpleinstrumentalvariablesestimatorthat\nGreene-50240bookJune11,200218:51CHAPTER10✦NonsphericalDisturbances207weintroducedinSection5.4.Inordertoobtainthisestimator,wemustrevertbacktotheclassical,thatishomoscedasticandnonautocorrelateddisturbancescase.Inthatinstance,theweightingmatrixinTheorem10.5willbeW=(ZZ)−1andwewillobtaintheapparentlymissingresult.10.5EFFICIENTESTIMATIONBYGENERALIZEDLEASTSQUARESEfficientestimationofβinthegeneralizedregressionmodelrequiresknowledgeof.Tobegin,itisusefultoconsidercasesinwhichisaknown,symmetric,positivedefinitematrix.Thisassumptionwilloccasionallybetrue,butinmostmodels,willcontainunknownparametersthatmustalsobeestimated.WeshallexaminethiscaseinSection10.6.10.5.1GENERALIZEDLEASTSQUARES(GLS)Sinceisapositivedefinitesymmetricmatrix,itcanbefactoredinto=CC,wherethecolumnsofCarethecharacteristicvectorsofandthecharacteristicrootsofarearrayedinthediagonalmatrix.Let1/2bethediagonalmatrixwithith√diagonalelementλ,andletT=C1/2.Then=TT.Also,letP=C−1/2,soi−1=PP.Premultiplythemodelin(10-1)byPtoobtainPy=PXβ+Pεory∗=X∗β+ε∗.(10-25)Thevarianceofε∗isE[εε]=Pσ2P=σ2I,∗∗sotheclassicalregressionmodelappliestothistransformedmodel.Sinceisknown,y∗andX∗areobserveddata.Intheclassicalmodel,ordinaryleastsquaresisefficient;hence,βˆ=(XX)−1Xy∗∗∗∗=(XPPX)−1XPPy=(X−1X)−1X−1yistheefficientestimatorofβ.Thisestimatoristhegeneralizedleastsquares(GLS)orAitken(1935)estimatorofβ.Thisestimatorisincontrasttotheordinaryleastsquares(OLS)estimator,whichusesa“weightingmatrix,”I,insteadof−1.Byappealingtotheclassicalregressionmodelin(10-25),wehavethefollowingtheorem,whichincludesthegeneralizedregressionmodelanalogstoourresultsofChapters4and5.\nGreene-50240bookJune11,200218:51208CHAPTER10✦NonsphericalDisturbancesTHEOREM10.7PropertiesoftheGeneralizedLeastSquaresEstimatorIfE[ε∗|X∗]=0,thenE[βˆ|X]=E[(XX)−1Xy|X]=β+E[(XX)−1Xε|X]=β∗∗∗∗∗∗∗∗∗∗∗TheGLSestimatorβˆisunbiased.ThisresultisequivalenttoE[Pε|PX]=0,butsincePisamatrixofknownconstants,wereturntothefamiliarrequirementE[ε|X]=0.Therequirementthattheregressorsanddisturbancesbeuncorre-latedisunchanged.TheGLSestimatorisconsistentifplim(1/n)XX=Q,whereQisafinite∗∗∗∗positivedefinitematrix.Makingthesubstitution,weseethatthisimpliesplim[(1/n)X−1X]−1=Q−1.(10-26)∗WerequirethetransformeddataX∗=PX,nottheoriginaldataX,tobewellbehaved.11Undertheassumptionin(10-1),thefollowinghold:TheGLSestimatorisasymptoticallynormallydistributed,withmeanβandsamplingvarianceVar[βˆ|X]=σ2(XX)−1=σ2(X−1X)−1.(10-27)∗∗∗TheGLSestimatorβˆistheminimumvariancelinearunbiasedestimatorinthegeneralizedregressionmodel.ThisstatementfollowsbyapplyingtheGauss–Markovtheoremtothemodelin(10-25).TheresultinTheorem10.7isAitken’s(1935)Theorem,andβˆissometimescalledtheAitkenestimator.ThisbroadresultincludestheGauss–Markovtheoremasaspecialcasewhen=I.Fortestinghypotheses,wecanapplythefullsetofresultsinChapter6tothetrans-formedmodelin(10-25).FortestingtheJlinearrestrictions,Rβ=q,theappropriatestatisticis(Rβˆ−q)[Rσˆ2(XX)−1R]−1(Rβˆ−q)(εˆεˆ−εˆεˆ)/J∗∗ccF[J,n−K]==,Jσˆ2wheretheresidualvectorisεˆ=y∗−X∗βˆandεˆεˆ(y−Xβˆ)−1(y−Xβˆ)σˆ2==.(10-28)n−Kn−KTheconstrainedGLSresiduals,εˆc=y∗−X∗βˆc,arebasedonβˆ=βˆ−[X−1X]−1R[R(X−1X)−1R]−1(Rβˆ−q).12c11Onceagain,toallowatimetrend,wecouldweakenthisassumptionabit.12NotethatthisestimatoristheconstrainedOLSestimatorusingthetransformeddata.\nGreene-50240bookJune11,200218:51CHAPTER10✦NonsphericalDisturbances209Tosummarize,alltheresultsfortheclassicalmodel,includingtheusualinferenceprocedures,applytothetransformedmodelin(10-25).ThereisnoprecisecounterparttoR2inthegeneralizedregressionmodel.Alter-nativeshavebeenproposed,butcaremustbetakenwhenusingthem.Forexample,onechoiceistheR2inthetransformedregression,(10-25).Butthisregressionneednothaveaconstantterm,sotheR2isnotboundedbyzeroandone.Evenifthereisaconstantterm,thetransformedregressionisacomputationaldevice,notthemodelofinterest.Thatagood(orbad)fitisobtainedinthe“model”in(10-25)maybeofnointerest;thedependentvariableinthatmodely∗isdifferentfromtheoneinthemodelasoriginallyspecified.TheusualR2oftensuggeststhatthefitofthemodelisimprovedbyacorrectionforheteroscedasticityanddegradedbyacorrectionforautocorrelation,butbothchangescanoftenbeattributedtothecomputationofy∗.AmoreappealingfitmeasuremightbebasedontheresidualsfromtheoriginalmodeloncetheGLSestimatorisinhand,suchas(y−Xβˆ)(y−Xβˆ)R2=1−.Gn(y−y¯)2i=1iLiketheearliercontender,however,thismeasureisnotboundedintheunitinterval.Inaddition,thismeasurecannotbereliablyusedtocomparemodels.Thegeneralizedleastsquaresestimatorminimizesthegeneralizedsumofsquaresεε=(y−Xβ)−1(y−Xβ),∗∗notεε.Assuch,thereisnoassurance,forexample,thatdroppingavariablefromthemodelwillresultinadecreaseinR2,asitwillinR2.Othergoodness-of-fitmeasures,Gdesignedprimarilytobeafunctionofthesumofsquaredresiduals(raworweightedby−1)andtobeboundedbyzeroandone,havebeenproposed.13Unfortunately,theyallsufferfromatleastoneofthepreviouslynotedshortcomings.TheR2-likemeasuresinthissettingarepurelydescriptive.10.5.2FEASIBLEGENERALIZEDLEASTSQUARESTousetheresultsofSection10.5.1,mustbeknown.Ifcontainsunknownparametersthatmustbeestimated,thengeneralizedleastsquaresisnotfeasible.Butwithanunrestricted,therearen(n+1)/2additionalparametersinσ2.Thisnumberisfartoomanytoestimatewithnobservations.Obviously,somestructuremustbeimposedonthemodelifwearetoproceed.Thetypicalprobleminvolvesasmallsetofparametersθsuchthat=(θ).Acommonlyusedformulaintimeseriessettingsis1ρρ2ρ3···ρn−1ρ1ρρ2···ρn−2(ρ)=.,..ρn−1ρn−2···113See,example,Judgeetal.(1985,p.32)andBuse(1973).\nGreene-50240bookJune11,200218:51210CHAPTER10✦NonsphericalDisturbanceswhichinvolvesonlyoneadditionalunknownparameter.Amodelofheteroscedasticitythatalsohasonlyonenewparameterisσ2=σ2zθ.(10-29)iiSuppose,then,thatθˆisaconsistentestimatorofθ.(Weconsiderlaterhowsuchanestimatormightbeobtained.)TomakeGLSestimationfeasible,weshalluseˆ=(θˆ)insteadofthetrue.Theissueweconsiderhereiswhetherusing(θˆ)requiresustochangeanyoftheresultsofSection10.5.1.Itwouldseemthatifplimθˆ=θ,thenusingˆisasymptoticallyequivalenttousingthetrue.14Letthefeasiblegeneralizedleastsquares(FGLS)estimatorbedenotedβˆˆ=(Xˆ−1X)−1Xˆ−1y.ˆˆConditionsthatimplythatβisasymptoticallyequivalenttoβˆare11plimXˆ−1X−X−1X=0(10-30)nnand11plim√Xˆ−1ε−√X−1ε=0.(10-31)nnThefirstoftheseequationsstatesthatiftheweightedsumofsquaresmatrixbasedonthetrueconvergestoapositivedefinitematrix,thentheonebasedonˆconvergestothesamematrix.Weareassumingthatthisistrue.Inthesecondcondition,ifthetransformedregressorsarewellbehaved,thentheright-handsidesumwillhavealimitingnormaldistribution.ThisconditionisexactlytheoneweusedinChapter5toobtaintheasymptoticdistributionoftheleastsquaresestimator;hereweareusingthesameresultsforX∗andε∗.Therefore,(10-31)requiresthesameconditiontoholdwhenisreplacedwithˆ.15Theseconditions,inprinciple,mustbeverifiedonacase-by-casebasis.Fortunately,inmostfamiliarsettings,theyaremet.Ifweassumethattheyare,thentheFGLSestimatorbasedonθˆhasthesameasymptoticpropertiesastheGLSestimator.Thisresultisextremelyuseful.Note,especially,thefollowingtheorem.THEOREM10.8EfficiencyoftheFGLSEstimatorAnasymptoticallyefficientFGLSestimatordoesnotrequirethatwehaveanefficientestimatorofθ;onlyaconsistentoneisrequiredtoachievefullefficiencyfortheFGLSestimator.14Thisequationissometimesdenotedplimˆ=.Sinceisn×n,itcannothaveaprobabilitylimit.Weusethistermtoindicateconvergenceelementbyelement.15Theconditionactuallyrequiresonlythatiftheright-handsumhasanylimitingdistribution,thentheleft-handonehasthesameone.Conceivably,thisdistributionmightnotbethenormaldistribution,butthatseemsunlikelyexceptinaspeciallyconstructed,theoreticalcase.\nGreene-50240bookJune11,200218:51CHAPTER10✦NonsphericalDisturbances211Exceptforthesimplestcases,thefinite-samplepropertiesandexactdistributionsofFGLSestimatorsareunknown.TheasymptoticefficiencyofFGLSestimatorsmaynotcarryovertosmallsamplesbecauseofthevariabilityintroducedbytheestimated.SomeanalysesforthecaseofheteroscedasticityaregivenbyTaylor(1977).AmodelofautocorrelationisanalyzedbyGrilichesandRao(1969).Inbothstudies,theauthorsfindthat,overabroadrangeofparameters,FGLSismoreefficientthanleastsquares.Butifthedeparturefromtheclassicalassumptionsisnottoosevere,thenleastsquaresmaybemoreefficientthanFGLSinasmallsample.10.6MAXIMUMLIKELIHOODESTIMATIONThissectionconsidersefficientestimationwhenthedisturbancesarenormallydis-tributed.Asbefore,weconsidertwocases,first,tosetthestage,thebenchmarkcaseofknown,and,second,themorecommoncaseofunknown.16Ifthedisturbancesaremultivariatenormallydistributed,thenthelog-likelihoodfunctionforthesampleisnn11lnL=−ln(2π)−lnσ2−(y−Xβ)−1(y−Xβ)−ln||.(10-32)222σ22Sinceisamatrixofknownconstants,themaximumlikelihoodestimatorofβisthevectorthatminimizesthegeneralizedsumofsquares,S(β)=(y−Xβ)−1(y−Xβ)∗(hencethenamegeneralizedleastsquares).ThenecessaryconditionsformaximizingLare∂lnL11=X−1(y−Xβ)=X(y−Xβ)=0,∂βσ2σ2∗∗∗∂lnLn1=−+(y−Xβ)−1(y−Xβ)(10-33)∂σ22σ22σ4n1=−+(y−Xβ)(y−Xβ)=0.2σ22σ4∗∗∗∗ThesolutionsaretheOLSestimatorsusingthetransformeddata:βˆ=(XX)−1Xy=(X−1X)−1X−1y,(10-34)ML∗∗∗∗1σˆ2=(y−Xβˆ)(y−Xβˆ)ML∗∗∗∗n(10-35)1=(y−Xβˆ)−1(y−Xβˆ),nwhichimpliesthatwithnormallydistributeddisturbances,generalizedleastsquaresis16ThemethodofmaximumlikelihoodestimationisdevelopedinChapter17.\nGreene-50240bookJune11,200218:51212CHAPTER10✦NonsphericalDisturbancesalsomaximumlikelihood.Asintheclassicalregressionmodel,themaximumlikelihoodestimatorofσ2isbiased.Anunbiasedestimatoristheonein(10-28).Theconclusion,whichwouldbeexpected,isthatwhenisknown,themaximumlikelihoodestimatorisgeneralizedleastsquares.Whenisunknownandmustbeestimated,thenitisnecessarytomaximizetheloglikelihoodin(10-32)withrespecttothefullsetofparameters[β,σ2,]simultaneously.Sinceanunrestrictedalonecontainsn(n+1)/2−1parameters,itisclearthatsomerestrictionwillhavetobeplacedonthestructureofinorderforestimationtoproceed.Wewillexamineseveralapplicationsinwhich=(θ)forsomesmallervectorofparametersinthenexttwochapters,sowewillnoteonlyafewgeneralresultsatthispoint.(a)ForagivenvalueofθtheestimatorofβwouldbefeasibleGLSandtheestimatorofσ2wouldbetheestimatorin(10-35).(b)Thelikelihoodequationsforθwillgenerallybecomplicatedfunctionsofβandσ2,sojointestimationwillbenecessary.However,inmanycases,forgivenvaluesofβandσ2,theestimatorofθisstraightforward.Forexample,inthemodelof(10-29),theiteratedestimatorofθwhenβandσ2andapriorvalueofθaregivenisthepriorvalueplustheslopeintheregressionof(e2/σˆ2−1)onz.iiiThesecondstepsuggestsasortofbackandforthiterationforthismodelthatwillworkinmanysituations—startingwith,say,OLS,iteratingbackandforthbetween(a)and(b)untilconvergencewillproducethejointmaximumlikelihoodestimator.ThissituationwasexaminedbyOberhoferandKmenta(1974),whoshowedthatundersomefairlyweakrequirements,mostimportantlythatθnotinvolveσ2oranyoftheparametersinβ,thisprocedurewouldproducethemaximumlikelihoodestimator.Anotherimplicationofthisformulationwhichissimpletoshow(weleaveitasanexercise)isthatundertheOberhoferandKmentaassumption,theasymptoticcovariancematrixoftheestimatoristhesameastheGLSestimator.Thisisthesamewhetherisknownorestimated,whichmeansthatifθandβhavenoparametersincommon,thenexactknowledgeofbringsnogaininasymptoticefficiencyintheestimationofβoverestimationofβwithaconsistentestimatorof.10.7SUMMARYANDCONCLUSIONSThischapterhasintroducedamajorextensionoftheclassicallinearmodel.Byallowingforheteroscedasticityandautocorrelationinthedisturbances,weexpandtherangeofmodelstoalargearrayofframeworks.Wewillexploretheseinthenextseveralchapters.Theformalconceptsintroducedinthischapterincludehowthisextensionaffectsthepropertiesoftheleastsquaresestimator,howanappropriateestimatoroftheasymptoticcovariancematrixoftheleastsquaresestimatorcanbecomputedinthisextendedmodelingframework,and,finally,howtousetheinformationaboutthevariancesandcovariancesofthedisturbancestoobtainanestimatorthatismoreefficientthanordinaryleastsquares.\nGreene-50240bookJune11,200218:51CHAPTER10✦NonsphericalDisturbances213KeyTermsandConcepts•Aitken’sTheorem•Heteroscedasticity•Orthogonalitycondition•Asymptoticproperties•Instrumentalvariables•Paneldata•Autocorrelationestimator•Parametric•Efficientestimator•Methodofmoments•Populationmoment•FeasibleGLSestimatorequation•Finitesampleproperties•Newey–Westestimator•Rankcondition•Generalizedleastsquares•Nonlinearleastsquares•Robustestimation(GLS)estimator•Semiparametric•Generalizedregression•Ordercondition•Weightingmatrixmodel•Ordinaryleastsquares•Whiteestimator•GMMestimator(OLS)Exercises1.Whatisthecovariancematrix,Cov[βˆ,βˆ−b],oftheGLSestimatorβˆ=(X−1X)−1X−1yandthedifferencebetweenitandtheOLSestimator,b=(XX)−1Xy?TheresultplaysapivotalroleinthedevelopmentofspecificationtestsinHausman(1978).2.ThisandthenexttwoexercisesarebasedontheteststatisticusuallyusedtotestasetofJlinearrestrictionsinthegeneralizedregressionmodel:(Rβˆ−q)[R(X−1X)−1R]−1(Rβˆ−q)/JF[J,n−K]=,(y−Xβˆ)−1(y−Xβˆ)/(n−K)whereβˆistheGLSestimator.Showthatifisknown,ifthedisturbancesarenormallydistributedandifthenullhypothesis,Rβ=q,istrue,thenthisstatisticisexactlydistributedasFwithJandn−Kdegreesoffreedom.Whatassump-tionsabouttheregressorsareneededtoreachthisconclusion?Needtheybenon-stochastic?3.Nowsupposethatthedisturbancesarenotnormallydistributed,althoughisstillknown.Showthatthelimitingdistributionofpreviousstatisticis(1/J)timesachi-squaredvariablewithJdegreesoffreedom.(Hint:Thedenominatorconvergestoσ2.)Concludethatinthegeneralizedregressionmodel,thelimitingdistributionoftheWaldstatistic−1W=(Rβˆ−q)REst.Var[βˆ]R(Rβˆ−q)ischi-squaredwithJdegreesoffreedom,regardlessofthedistributionofthedistur-bances,aslongasthedataareotherwisewellbehaved.Notethatinafinitesample,thetruedistributionmaybeapproximatedwithanF[J,n−K]distribution.Itisabitambiguous,however,tointerpretthisfactasimplyingthatthestatisticisasymp-toticallydistributedasFwithJandn−Kdegreesoffreedom,becausethelimitingdistributionusedtoobtainourresultisthechi-squared,nottheF.Inthisinstance,theF[J,n−K]isarandomvariablethattendsasymptoticallytothechi-squaredvariate.4.Finally,supposethatmustbeestimated,butthatassumptions(10-27)and(10-31)aremetbytheestimator.Whatchangesarerequiredinthedevelopmentofthepreviousproblem?\nGreene-50240bookJune11,200218:51214CHAPTER10✦NonsphericalDisturbances5.Inthegeneralizedregressionmodel,iftheKcolumnsofXarecharacteristicvectorsof,thenordinaryleastsquaresandgeneralizedleastsquaresareidentical.(Theresultisactuallyabitbroader;XmaybeanylinearcombinationofexactlyKcharacteristicvectors.ThisresultisKruskal’sTheorem.)a.Provetheresultdirectlyusingmatrixalgebra.b.ProvethatifXcontainsaconstanttermandiftheremainingcolumnsareindeviationform(sothatthecolumnsumiszero),thenthemodelofExercise8belowisoneofthesecases.(Theseeminglyunrelatedregressionsmodelwithidenticalregressormatrices,discussedinChapter14,isanother.)6.Inthegeneralizedregressionmodel,supposethatisknown.a.WhatisthecovariancematrixoftheOLSandGLSestimatorsofβ?b.WhatisthecovariancematrixoftheOLSresidualvectore=y−Xb?c.WhatisthecovariancematrixoftheGLSresidualvectorεˆ=y−Xβˆ?d.WhatisthecovariancematrixoftheOLSandGLSresidualvectors?−y/(βx)7.Supposethatyhasthepdff(y|x)=(1/xβ)e,y>0.ThenE[y|x]=βxandVar[y|x]=(βx)2.Forthismodel,provethatGLSandMLEarethesame,eventhoughthisdistributioninvolvesthesameparametersintheconditionalmeanfunctionandthedisturbancevariance.8.Supposethattheregressionmodelisy=µ+ε,whereεhasazeromean,constantvariance,andequalcorrelationρacrossobservations.ThenCov[ε,ε]=σ2ρifiji=j.Provethattheleastsquaresestimatorofµisinconsistent.Findthecharac-teristicrootsofandshowthatCondition2.afterTheorem10.2isviolated.\nGreene-50240bookJune17,200216:2111HETEROSCEDASTICITYQ11.1INTRODUCTIONRegressiondisturbanceswhosevariancesarenotconstantacrossobservationsarehet-eroscedastic.Heteroscedasticityarisesinnumerousapplications,inbothcross-sectionandtime-seriesdata.Forexample,evenafteraccountingforfirmsizes,weexpecttoobservegreatervariationintheprofitsoflargefirmsthaninthoseofsmallones.Thevari-anceofprofitsmightalsodependonproductdiversification,researchanddevelopmentexpenditure,andindustrycharacteristicsandthereforemightalsovaryacrossfirmsofsimilarsizes.Whenanalyzingfamilyspendingpatterns,wefindthatthereisgreatervari-ationinexpenditureoncertaincommoditygroupsamonghigh-incomefamiliesthanlowonesduetothegreaterdiscretionallowedbyhigherincomes.1Intheheteroscedasticregressionmodel,Var[ε|x]=σ2,i=1,...,n.iiiWecontinuetoassumethatthedisturbancesarepairwiseuncorrelated.Thus,2ω100···0σ100···00ω20···0σ20···222E[εε|X]=σ=σ..=.....000···ωn000···σ2nItwillsometimesproveusefultowriteσ2=σ2ω.Thisformisanarbitraryscalingiiwhichallowsustouseanormalization,ntr()=ωi=ni=1Thismakestheclassicalregressionwithhomoscedasticdisturbancesasimplespecialcasewithωi=1,i=1,...,n.Intuitively,onemightthenthinkoftheωsasweightsthatarescaledinsuchawayastoreflectonlythevarietyinthedisturbancevariances.Thescalefactorσ2thenprovidestheoverallscalingofthedisturbanceprocess.Example11.1HeteroscedasticRegressionThedatainAppendixTableF9.1givemonthlycreditcardexpenditurefor100individuals,sampledfromalargersampleof13,444people.Linearregressionofmonthlyexpenditureonaconstant,age,incomeanditssquare,andadummyvariableforhomeownershipusingthe72observationsforwhichexpenditurewasnonzeroproducestheresidualsplottedinFig-ure11.1.Thepatternoftheresidualsischaracteristicofaregressionwithheteroscedasticity.1PraisandHouthakker(1955).215\nGreene-50240bookJune17,200216:21216CHAPTER11✦Heteroscedasticity200015001000U5000500024681012IncomeFIGURE11.1PlotofResidualsAgainstIncome.Thischapterwillpresenttheheteroscedasticregressionmodel,firstingeneralterms,thenwithsomespecificformsofthedisturbancecovariancematrix.Webeginbyex-aminingtheconsequencesofheteroscedasticityforleastsquaresestimation.Wethenconsiderrobustestimation,intwoframeworks.Section11.2presentsappropriateesti-matorsoftheasymptoticcovariancematrixoftheleastsquaresestimator.Section11.3discussesGMMestimation.Sections11.4to11.7presentmorespecificformulationsofthemodel.Sections11.4and11.5considergeneralized(weighted)leastsquares,whichrequiresknowledgeatleastoftheformof.Section11.7presentsmaximumlikelihoodestimatorsfortwospecificwidelyusedmodelsofheteroscedasticity.Recentanalysesoffinancialdata,suchasexchangerates,thevolatilityofmarketreturns,andinflation,havefoundabundantevidenceofclusteringoflargeandsmalldisturbances,2whichsuggestsaformofheteroscedasticityinwhichthevarianceofthedisturbancedependsonthesizeoftheprecedingdisturbance.Engle(1982)suggestedtheAutoRegressive,ConditionallyHeteroscedastic,orARCH,modelasanalternativetothestandardtime-seriestreatments.WewillexaminetheARCHmodelinSection11.8.11.2ORDINARYLEASTSQUARESESTIMATIONWeshowedinSection10.2thatinthepresenceofheteroscedasticity,theleastsquaresestimatorbisstillunbiased,consistent,andasymptoticallynormallydistributed.The2PioneeringstudiesintheanalysisofmacroeconomicdataincludeEngle(1982,1983)andCragg(1982).\nGreene-50240bookJune17,200216:21CHAPTER11✦Heteroscedasticity217asymptoticcovariancematrixis2−1−1σ111Asy.Var[b]=plimXXplimXXplimXX.nnnnEstimationoftheasymptoticcovariancematrixwouldbebasedonnVar[b|X]=(XX)−1σ2ωxx(XX)−1.iiii=1[See(10-5).]Assuming,asusual,thattheregressorsarewellbehaved,sothat(XX/n)−1convergestoapositivedefinitematrix,wefindthatthemeansquareconsistencyofbdependsonthelimitingbehaviorofthematrix:XX1nQ∗==ωxx.(11-1)niiinni=1IfQ∗convergestoapositivedefinitematrixQ∗,thenasn→∞,bwillconvergetoβninmeansquare.Undermostcircumstances,ifωiisfiniteforalli,thenwewouldexpectthisresulttobetrue.NotethatQ∗isaweightedsumofthesquaresandcrossproductsnofxwithweightsωi/n,whichsumto1.WehavealreadyassumedthatanotherweightedsumXX/n,inwhichtheweightsare1/n,convergestoapositivedefinitematrixQ,soitwouldbesurprisingifQ∗didnotconvergeaswell.Ingeneral,then,wewouldexpectthatnσ2a−1∗−1∗∗b∼Nβ,QQQ,withQ=plimQn.nAformalproofisbasedonSection5.2withQ=ωxx.iiii11.2.1INEFFICIENCYOFLEASTSQUARESItfollowsfromourearlierresultsthatbisinefficientrelativetotheGLSestimator.Byhowmuchwilldependonthesetting,butthereissomegeneralitytothepattern.Asmightbeexpected,thegreateristhedispersioninωiacrossobservations,thegreatertheefficiencyofGLSoverOLS.Theimpactofthisontheefficiencyofestimationwilldependcruciallyonthenatureofthedisturbancevariances.Intheusualcases,inwhichωidependsonvariablesthatappearelsewhereinthemodel,thegreateristhedispersioninthesevariables,thegreaterwillbethegaintousingGLS.Itisimportanttonote,however,thatboththesecomparisonsarebasedonknowledgeof.Inpractice,oneoftwocasesislikelytobetrue.Ifwedohavedetailedknowledgeof,theperformanceoftheinefficientestimatorisamootpoint.WewilluseGLSorfeasibleGLSanyway.Inthemorecommoncase,wewillnothavedetailedknowledgeof,sothecomparisonisnotpossible.11.2.2THEESTIMATEDCOVARIANCEMATRIXOFbIfthetypeofheteroscedasticityisknownwithcertainty,thentheordinaryleastsquaresestimatorisundesirable;weshouldusegeneralizedleastsquaresinstead.Thepreciseformoftheheteroscedasticityisusuallyunknown,however.Inthatcase,generalizedleastsquaresisnotusable,andwemayneedtosalvagewhatwecanfromtheresultsofordinaryleastsquares.\nGreene-50240bookJune17,200216:21218CHAPTER11✦HeteroscedasticityTheconventionallyestimatedcovariancematrixfortheleastsquaresestimatorσ2(XX)−1isinappropriate;theappropriatematrixisσ2(XX)−1(XX)(XX)−1.Itisunlikelythatthesetwowouldcoincide,sotheusualestimatorsofthestandarderrorsarelikelytobeerroneous.Inthissection,weconsiderhowerroneoustheconventionalestimatorislikelytobe.Asusual,eeεMεs2==,(11-2)n−Kn−KwhereM=I−X(XX)−1X.Expandingthisequation,weobtainεεεX(XX)−1Xεs2=−.(11-3)n−Kn−KTakingthetwopartsseparatelyyieldsεεtrE[εε|X]nσ2EX==.(11-4)n−Kn−Kn−K[Wehaveusedthescalingtr()=n.]Inaddition,εX(XX)−1XεtrE[(XX)−1XεεX|X]EX=n−Kn−K−1XXXXtrσ2−1σ2XXnn∗==trQn,(11-5)n−Kn−KnwhereQ∗isdefinedin(11-1).Asn→∞,thetermin(11-4)willconvergetoσ2.Thentermin(11-5)willconvergetozeroifbisconsistentbecausebothmatricesintheproductarefinite.Therefore:Ifbisconsistent,thenlimE[s2]=σ2.n→∞Itcanalsobeshown—weleaveitasanexercise—thatifthefourthmomentofeverydisturbanceisfiniteandallourotherassumptionsaremet,theneeεεlimVar=limVar=0.n→∞n−Kn→∞n−KThisresultimplies,therefore,that:Ifplimb=β,thenplims2=σ2.Beforeproceeding,itisusefultopursuethisresult.Thenormalizationtr()=nimpliesthat1σ2σ2=σ¯2=σ2andω=i.niiσ¯2iTherefore,ourpreviousconvergenceresultimpliesthattheleastsquaresestimators2convergestoplimσ¯2,thatis,theprobabilitylimitoftheaveragevarianceofthedisturbances,assumingthatthisprobabilitylimitexists.Thus,somefurtherassumption\nGreene-50240bookJune17,200216:21CHAPTER11✦Heteroscedasticity219aboutthesevariancesisnecessarytoobtaintheresult.(Foranapplication,seeExercise5inChapter13.)Thedifferencebetweentheconventionalestimatorandtheappropriate(true)covariancematrixforbisEst.Var[b|X]−Var[b|X]=s2(XX)−1−σ2(XX)−1(XX)(XX)−1.(11-6)Inalargesample(sothats2≈σ2),thisdifferenceisapproximatelyequaltoσ2XX−1XXXXXX−1D=−.(11-7)nnnnnThedifferencebetweenthetwomatriceshingesonnnnXXXX1ωi1=−=xixi−xixi=(1−ωi)xixi,(11-8)nnnnni=1i=1i=1wherexistheithrowofX.ThesearetwoweightedaveragesofthematricesQ=xx,iiiiusingweights1forthefirsttermandωiforthesecond.Thescalingtr()=nimpliesthati(ωi/n)=1.Whethertheweightedaveragebasedonωi/ndiffersmuchfromtheoneusing1/ndependsontheweights.Iftheweightsarerelatedtothevaluesinx,thenthedifferencecanbeconsiderable.Iftheweightsareuncorrelatedwithxx,iiihowever,thentheweightedaveragewilltendtoequaltheunweightedaverage.3Therefore,thecomparisonrestsonwhethertheheteroscedasticityisrelatedtoanyofxkorxj×xk.Theconclusionisthat,ingeneral:Iftheheteroscedasticityisnotcorrelatedwiththevariablesinthemodel,thenatleastinlargesamples,theordinaryleastsquarescomputations,althoughnottheoptimalwaytousethedata,willnotbemisleading.Forexample,inthegroupwiseheteroscedasticitymodelofSection11.7.2,iftheobservationsaregroupedinthesubsamplesinawaythatisunrelatedtothevariablesinX,thentheusualOLSestimatorofVar[b]will,atleastinlargesamples,provideareliableestimateoftheappropriatecovariancematrix.Itisworthremembering,however,thattheleastsquaresestimatorwillbeinefficient,themoresothelargerarethedifferencesamongthevariancesofthegroups.4Theprecedingisausefulresult,butoneshouldnotbeoverlyoptimistic.First,itre-mainstruethatordinaryleastsquaresisdemonstrablyinefficient.Second,iftheprimaryassumptionoftheanalysis—thattheheteroscedasticityisunrelatedtothevariablesinthemodel—isincorrect,thentheconventionalstandarderrorsmaybequitefarfromtheappropriatevalues.11.2.3ESTIMATINGTHEAPPROPRIATECOVARIANCEMATRIXFORORDINARYLEASTSQUARESItisclearfromtheprecedingthatheteroscedasticityhassomepotentiallyseriousim-plicationsforinferencesbasedontheresultsofleastsquares.Theapplicationofmore3Suppose,forexample,thatXcontainsasinglecolumnandthatbothxiandωiareindependentandidenticallydistributedrandomvariables.Thenxx/nconvergestoE[x2],whereasxx/nconvergestoCov[ωi,x2]+iiE[ωi]E[x2].E[ωi]=1,soifωandx2areuncorrelated,thenthesumshavethesameprobabilitylimit.i4Somegeneralresults,includinganalysisofthepropertiesoftheestimatorbasedonestimatedvariances,aregiveninTaylor(1977).\nGreene-50240bookJune17,200216:21220CHAPTER11✦Heteroscedasticityappropriateestimationtechniquesrequiresadetailedformulationof,however.Itmaywellbethattheformoftheheteroscedasticityisunknown.White(1980)hasshownthatitisstillpossibletoobtainanappropriateestimatorforthevarianceoftheleastsquaresestimator,eveniftheheteroscedasticityisrelatedtothevariablesinX.TheWhiteestimator[see(10-14)inSection10.35]−1n−11XX1XXEst.Asy.Var[b]=e2xx,(11-9)iiinnnni=1whereeiistheithleastsquaresresidual,canbeusedasanestimateoftheasymptoticvarianceoftheleastsquaresestimator.AnumberofstudieshavesoughttoimproveontheWhiteestimatorforOLS.6Theasymptoticpropertiesoftheestimatorareunambiguous,butitsusefulnessinsmallsamplesisopentoquestion.ThepossibleproblemsstemfromthegeneralresultthatthesquaredOLSresidualstendtounderestimatethesquaresofthetruedisturbances.[Thatiswhyweuse1/(n−K)ratherthan1/nincomputings2.]Theendresultisthatinsmallsamples,atleastassuggestedbysomeMonteCarlostudies[e.g.,MacKinnonandWhite(1985)],theWhiteestimatorisabittoooptimistic;thematrixisabittoosmall,soasymptotictratiosarealittletoolarge.DavidsonandMacKinnon(1993,p.554)suggestanumberoffixes,whichinclude(1)scalinguptheendresultbyafactorn/(n−K)and(2)usingthesquaredresidualscaledbyitstruevariance,e2/m,insteadiiiofe2,wherem=1−x(XX)−1x.7[See(4-20).]Onthebasisoftheirstudy,DavidsoniiiiiandMacKinnonstronglyadvocateoneortheothercorrection.Theiradmonition“Oneshouldneveruse[theWhiteestimator]because[(2)]alwaysperformsbetter”seemsabitstrong,butthepointiswelltaken.Theuseofsharpasymptoticresultsinsmallsamplescanbeproblematic.ThelasttworowsofTable11.1showtherecomputedstandarderrorswiththesetwomodifications.Example11.2TheWhiteEstimatorUsingWhite’sestimatorfortheregressioninExample11.1producestheresultsintherowlabeled“WhiteS.E.”inTable11.1.Thetwoincomecoefficientsareindividuallyandjointlysta-tisticallysignificantbasedontheindividualtratiosandF(2,67)=[(0.244−0.064)/2]/[0.776/(72−5)]=7.771.The1percentcriticalvalueis4.94.Thedifferencesintheestimatedstandarderrorsseemfairlyminorgiventheextremeheteroscedasticity.Onesurpriseisthedeclineinthestandarderroroftheagecoefficient.TheFtestisnolongeravailablefortestingthejointsignificanceofthetwoincomecoefficientsbecauseitreliesonhomoscedasticity.AWaldtest,however,maybeusedinanyevent.Thechi-squaredtestisbasedon−100010W=(Rb)REst.Asy.Var[b]R(Rb)whereR=,00001andtheestimatedasymptoticcovariancematrixistheWhiteestimator.TheFstatisticbasedonleastsquaresis7.771.TheWaldstatisticbasedontheWhiteestimatoris20.604;the95percentcriticalvalueforthechi-squareddistributionwithtwodegreesoffreedomis5.99,sotheconclusionisunchanged.5SeealsoEicker(1967),Horn,Horn,andDuncan(1975),andMacKinnonandWhite(1985).6See,e.g.,MacKinnonandWhite(1985)andMesserandWhite(1984).7Theyalsosuggestathirdcorrection,e2/m2,asanapproximationtoanestimatorbasedonthe“jackknife”iiitechnique,buttheiradvocacyofthisestimatorismuchweakerthanthatoftheothertwo.\nGreene-50240bookJune17,200216:21CHAPTER11✦Heteroscedasticity221TABLE11.1LeastSquaresRegressionResultsConstantAgeOwnRentIncomeIncome2SampleMean32.080.363.369Coefficient−237.15−3.081827.941234.35−14.997StandardError199.355.514782.92280.3667.4693tratio−1.10−0.55900.3372.916−2.008WhiteS.E.212.993.301792.18888.8666.9446D.andM.(1)270.793.422795.56692.1227.1991D.andM.(2)221.093.447795.63292.0837.1995R2=0.243578,s=284.75080MeanExpenditure=$189.02.Incomeis×$10,000TestsforHeteroscedasticity:White=14.329,Goldfeld–Quandt=15.001,Breusch–Pagan=41.920,Koenker–Bassett=6.187.(Twodegreesoffreedom.χ2=5.99.)∗11.3GMMESTIMATIONOFTHEHETEROSCEDASTICREGRESSIONMODELTheGMMestimatorintheheteroscedasticregressionmodelisproducedbytheempir-icalmomentequations1n1xy−xβˆ=Xεˆβˆ=m¯βˆ=0.(11-10)iiiGMMGMMGMMnni=1Theestimatorisobtainedbyminimizingq=m¯βˆWm¯βˆGMMGMMwhereWisapositivedefiniteweightingmatrix.Theoptimalweightingmatrixwouldbe√−1W=Asy.Var[nm¯(β)]whichistheinverseof√1n1nAsy.Var[nm¯(β)]=Asy.Var√xε=plimσ2ωxx=σ2Q∗iiiiinn→∞ni=1i=1[see(11-1)].Theoptimalweightingmatrixwouldbe[σ2Q∗]−1.But,recallthatthisminimizationproblemisanexactlyidentifiedcase,so,theweightingmatrixisirrelevanttothesolution.Youcanseethatinthemomentequation—thatequationissimplythenormalequationsforleastsquares.Wecansolvethemomentequationsexactly,sothereisnoneedfortheweightingmatrix.Regardlessofthecovariancematrixofthemoments,theGMMestimatorfortheheteroscedasticregressionmodelisordinaryleastsquares.(ThisisCase2analyzedinSection10.4.)Wecanusetheresultswehavealreadyobtainedtofinditsasymptoticcovariancematrix.TheresultappearsinSection11.2.TheimpliedestimatoristheWhiteestimatorin(11-9).[Onceagain,seeTheorem10.6.]Theconclusiontobedrawnatthispointisthatuntilwemakesomespecificassumptionsaboutthevariances,wedonothaveamoreefficientestimatorthanleastsquares,butwedohavetomodifytheestimatedasymptoticcovariancematrix.\nGreene-50240bookJune17,200216:21222CHAPTER11✦Heteroscedasticity11.4TESTINGFORHETEROSCEDASTICITYHeteroscedasticityposespotentiallysevereproblemsforinferencesbasedonleastsquares.Onecanrarelybecertainthatthedisturbancesareheteroscedastichowever,andunfortunately,whatformtheheteroscedasticitytakesiftheyare.Assuch,itisusefultobeabletotestforhomoscedasticityandifnecessary,modifyourestimationproce-duresaccordingly.8Severaltypesoftestshavebeensuggested.Theycanberoughlygroupedindescendingorderintermsoftheirgeneralityand,asmightbeexpected,inascendingorderintermsoftheirpower.9Mostofthetestsforheteroscedasticityarebasedonthefollowingstrategy.Ordinaryleastsquaresisaconsistentestimatorofβeveninthepresenceofheteroscedasticity.Assuch,theordinaryleastsquaresresidualswillmimic,albeitimperfectlybecauseofsamplingvariability,theheteroscedasticityofthetruedisturbances.Therefore,testsdesignedtodetectheteroscedasticitywill,inmostcases,beappliedtotheordinaryleastsquaresresiduals.11.4.1WHITE’SGENERALTESTToformulatemostoftheavailabletests,itisnecessarytospecify,atleastinroughterms,thenatureoftheheteroscedasticity.ItwouldbedesirabletobeabletotestageneralhypothesisoftheformH:σ2=σ2foralli,0iH1:NotH0.Inviewofourearlierfindingsonthedifficultyofestimationinamodelwithnunknownparameters,thisisratherambitious.Nonetheless,suchatesthasbeendevisedbyWhite(1980b).ThecorrectcovariancematrixfortheleastsquaresestimatorisVar[b|X]=σ2[XX]−1[XX][XX]−1,(11-11)which,aswehaveseen,canbeestimatedusing(11-9).TheconventionalestimatorisV=s2[XX]−1.Ifthereisnoheteroscedasticity,thenVwillgiveaconsistentestimatorofVar[b|X],whereasifthereis,thenitwillnot.Whitehasdevisedastatisticaltestbasedonthisobservation.AsimpleoperationalversionofhistestiscarriedoutbyobtainingnR2intheregressionofe2onaconstantandalluniquevariablescontainedinxandiallthesquaresandcrossproductsofthevariablesinx.Thestatisticisasymptoticallydistributedaschi-squaredwithP−1degreesoffreedom,wherePisthenumberofregressorsintheequation,includingtheconstant.TheWhitetestisextremelygeneral.Tocarryitout,weneednotmakeanyspecificassumptionsaboutthenatureoftheheteroscedasticity.Althoughthischaracteristicisavirtue,itis,atthesametime,apotentiallyseriousshortcoming.Thetestmayreveal8Thereisthepossibilitythatapreliminarytestforheteroscedasticitywillincorrectlyleadustouseweightedleastsquaresorfailtoalertustoheteroscedasticityandleadusimproperlytouseordinaryleastsquares.SomelimitedresultsonthepropertiesoftheresultingestimatoraregivenbyOhtaniandToyoda(1980).Theirresultssuggestthatitisbesttotestfirstforheteroscedasticityratherthanmerelytoassumethatitispresent.9AstudythatexaminesthepowerofseveraltestsforheteroscedasticityisAliandGiaccotto(1984).\nGreene-50240bookJune17,200216:21CHAPTER11✦Heteroscedasticity223heteroscedasticity,butitmayinsteadsimplyidentifysomeotherspecificationerror(suchastheomissionofx2fromasimpleregression).10Exceptinthecontextofaspecificproblem,littlecanbesaidaboutthepowerofWhite’stest;itmaybeverylowagainstsomealternatives.Inaddition,unlikesomeoftheothertestsweshalldiscuss,theWhitetestisnonconstructive.Ifwerejectthenullhypothesis,thentheresultofthetestgivesnoindicationofwhattodonext.11.4.2THEGOLDFELD–QUANDTTESTBynarrowingourfocussomewhat,wecanobtainamorepowerfultest.TwoteststhatarerelativelygeneralaretheGoldfeld–Quandt(1965)testandtheBreusch–Pagan(1979)Lagrangemultipliertest.FortheGoldfeld–Quandttest,weassumethattheobservationscanbedividedintotwogroupsinsuchawaythatunderthehypothesisofhomoscedasticity,thedisturbancevarianceswouldbethesameinthetwogroups,whereasunderthealternative,thedisturbancevarianceswoulddiffersystematically.ThemostfavorablecaseforthiswouldbethegroupwiseheteroscedasticmodelofSection11.7.2andExample11.7oramodelsuchasσ2=σ2x2forsomevariablex.Byrankingtheobservationsbasedonthisx,iiwecanseparatetheobservationsintothosewithhighandlowvariances.Thetestisappliedbydividingthesampleintotwogroupswithn1andn2observations.Toobtainstatisticallyindependentvarianceestimators,theregressionisthenestimatedseparatelywiththetwosetsofobservations.Theteststatisticisee/(n−K)111F[n1−K,n2−K]=,(11-12)e2e2/(n2−K)whereweassumethatthedisturbancevarianceislargerinthefirstsample.(Ifnot,thenreversethesubscripts.)Underthenullhypothesisofhomoscedasticity,thisstatistichasanFdistributionwithn1−Kandn2−Kdegreesoffreedom.ThesamplevaluecanbereferredtothestandardFtabletocarryoutthetest,withalargevalueleadingtorejectionofthenullhypothesis.Toincreasethepowerofthetest,GoldfeldandQuandtsuggestthatanumberofobservationsinthemiddleofthesamplebeomitted.Themoreobservationsthataredropped,however,thesmallerthedegreesoffreedomforestimationineachgroupwillbe,whichwilltendtodiminishthepowerofthetest.Asaconsequence,thechoiceofhowmanycentralobservationstodropislargelysubjective.EvidencebyHarveyandPhillips(1974)suggeststhatnomorethanathirdoftheobservationsshouldbedropped.Ifthedisturbancesarenormallydistributed,thentheGoldfeld–QuandtstatisticisexactlydistributedasFunderthenullhypothesisandthenominalsizeofthetestiscorrect.Ifnot,thentheFdistributionisonlyapproximateandsomealternativemethodwithknownlarge-sampleproperties,suchasWhite’stest,mightbepreferable.11.4.3THEBREUSCH–PAGAN/GODFREYLMTESTTheGoldfeld–Quandttesthasbeenfoundtobereasonablypowerfulwhenweareabletoidentifycorrectlythevariabletouseinthesampleseparation.Thisrequirementdoeslimititsgenerality,however.Forexample,severalofthemodelswewillconsiderallow10Thursby(1982)considersthisissueindetail.\nGreene-50240bookJune17,200216:21224CHAPTER11✦Heteroscedasticitythedisturbancevariancetovarywithasetofregressors.BreuschandPagan11havedevisedaLagrangemultipliertestofthehypothesisthatσ2=σ2f(α+αz),wherei0izisavectorofindependentvariables.12Themodelishomoscedasticifα=0.Thetesticanbecarriedoutwithasimpleregression:LM=1explainedsumofsquaresintheregressionofe2/(ee/n)onz.2iiForcomputationalpurposes,letZbethen×Pmatrixofobservationson(1,zi),andletgbethevectorofobservationsofg=e2/(ee/n)−1.TheniiLM=1[gZ(ZZ)−1Zg].2Underthenullhypothesisofhomoscedasticity,LMhasalimitingchi-squareddistri-butionwithdegreesoffreedomequaltothenumberofvariablesinzi.Thistestcanbeappliedtoavarietyofmodels,including,forexample,thoseexaminedinExam-ple11.3(3)andinSection11.7.13IthasbeenarguedthattheBreusch–PaganLagrangemultipliertestissensitivetotheassumptionofnormality.Koenker(1981)andKoenkerandBassett(1982)suggestthatthecomputationofLMbebasedonamorerobustestimatorofthevarianceofε2,in21eeV=e2−.inni=1Thevarianceofε2isnotnecessarilyequalto2σ4ifεisnotnormallydistributed.Letuiiequal(e2,e2,...,e2)andibeann×1columnof1s.Thenu¯=ee/n.Withthischange,12nthecomputationbecomes1LM=(u−u¯i)Z(ZZ)−1Z(u−u¯i).VUndernormality,thismodifiedstatisticwillhavethesameasymptoticdistributionastheBreusch–Paganstatistic,butabsentnormality,thereissomeevidencethatitprovidesamorepowerfultest.Waldman(1983)hasshownthatifthevariablesinziarethesameasthoseusedfortheWhitetestdescribedearlier,thenthetwotestsarealgebraicallythesame.Example11.3TestingforHeteroscedasticity1.White’sTest:ForthedatausedinExample11.1,thereare15variablesinx⊗xincludingtheconstantterm.ButsinceOwnrent2=OwnRentandIncome×Income=Income2,only13areunique.Regressionofthesquaredleastsquaresresidualsonthese13variablesproducesR2=0.199013.Thechi-squaredstatisticistherefore72(0.199013)=14.329.The95percentcriticalvalueofchi-squaredwith12degreesoffreedomis21.03,sodespitewhatmightseemtobeobviousinFigure11.1,thehypothesisofhomoscedasticityisnotrejectedbythistest.2.Goldfeld–QuandtTest:The72observationsaresortedbyIncome,andthentheregres-sioniscomputedwiththefirst36observationsandthesecond.Thetwosumsofsquaresare326,427and4,894,130,sotheteststatisticisF[31,31]=4,894,130/326,427=15.001.Thecriticalvaluefromthistableis1.79,sothistestreachestheoppositeconclusion.11BreuschandPagan(1979).12LagrangemultipliertestsarediscussedinSection17.5.3.13Themodelσ2=σ2exp(αzi)isoneofthesecases.Inanalyzingthismodelspecifically,Harvey(1976)derivedithesameteststatistic.\nGreene-50240bookJune17,200216:21CHAPTER11✦Heteroscedasticity2253.Breusch–PaganTest:Thistestrequiresaspecificalternativehypothesis.Forthispur-pose,wespecifythetestbasedonz=[1,Income,IncomeSq].Usingtheleastsquaresresid-uals,wecomputeg=e2/(ee/72)−1;thenLM=1gZ(ZZ)−1Zg.Thesumofsquaresii2is5,432,562.033.ThecomputationproducesLM=41.920.Thecriticalvalueforthechi-squareddistributionwithtwodegreesoffreedomis5.99,sothehypothesisofhomoscedas-ticityisrejected.TheKoenkerandBassettvariantofthisstatisticisonly6.187,whichisstillsignificantbutmuchsmallerthantheLMstatistic.Thewidedifferencebetweenthesetwostatisticssuggeststhattheassumptionofnormalityiserroneous.Absentanyknowledgeoftheheteroscedasticity,wemightusetheBeraandJarque(1981,1982)andKieferandSalmon(1983)testfornormality,23242χ[2]=n[(m3/s)+((m4−3)/s)]jwheremj=(1/n)iei.Underthenullhypothesisofhomoscedasticandnormallydistributeddisturbances,thisstatistichasalimitingchi-squareddistributionwithtwodegreesoffree-dom.Basedontheleastsquaresresiduals,thevalueis482.12,whichcertainlydoesleadtorejectionofthehypothesis.Somecautioniswarrantedhere,however.Itisunclearwhatpartofthehypothesisshouldberejected.WehaveconvincingevidenceinFigure11.1thatthedisturbancesareheteroscedastic,sotheassumptionofhomoscedasticityunderlyingthistestisquestionable.Thisdoessuggesttheneedtoexaminethedatabeforeapplyingaspecificationtestsuchasthisone.11.5WEIGHTEDLEASTSQUARESWHENISKNOWNHavingtestedforandfoundevidenceofheteroscedasticity,thelogicalnextstepistorevisetheestimationtechniquetoaccountforit.TheGLSestimatorisβˆ=(X−1X)−1X−1y.Considerthemostgeneralcase,Var[ε|x]=σ2=σ2ω.Then−1isadiagonalmatrixiiiiwhoseithdiagonalelementis1/ωi.TheGLSestimatorisobtainedbyregressing√√y1/ω1x1/ω1√√y2/ω2x2/ω2Py=.onPX=......√√yn/ωnxn/ωnApplyingordinaryleastsquarestothetransformedmodel,weobtaintheweightedleastsquares(WLS)estimator.−1nnβˆ=wxxwxy,(11-13)iiiiiii=1i=1wherew=1/ω.14Thelogicofthecomputationisthatobservationswithsmallervari-iiancesreceivealargerweightinthecomputationsofthesumsandthereforehavegreaterinfluenceintheestimatesobtained.14Theweightsareoftendenotedwi=1/σ2.Thisexpressionisconsistentwiththeequivalentβˆ=i[X(σ2)−1X]−1X(σ2)−1y.Theσ2’scancel,leavingtheexpressiongivenpreviously.\nGreene-50240bookJune17,200216:21226CHAPTER11✦HeteroscedasticityAcommonspecificationisthatthevarianceisproportionaltooneoftheregressorsoritssquare.Ourearlierexampleoffamilyexpendituresisoneinwhichtherelevantvariableisusuallyincome.Similarly,instudiesoffirmprofits,thedominantvariableistypicallyassumedtobefirmsize.Ifσ2=σ2x2,iikthenthetransformedregressionmodelforGLSisyx1x2ε=βk+β1+β2+···+.(11-14)xkxkxkxkIfthevarianceisproportionaltoxinsteadofx2,thentheweightappliedtoeach√kkobservationis1/xkinsteadof1/xk.In(11-14),thecoefficientonxkbecomestheconstantterm.Butifthevarianceisproportionaltoanypowerofxkotherthantwo,thenthetransformedmodelwillnolongercontainaconstant,andweencountertheproblemofinterpretingR2mentionedearlier.Forexample,noconclusionshouldbedrawniftheR2intheregressionofy/zon1/zandx/zishigherthanintheregressionofyonaconstantandxforanyz,includingx.Thegoodfitoftheweightedregressionmightbeduetothepresenceof1/zonbothsidesoftheequality.Itisrarelypossibletobecertainaboutthenatureoftheheteroscedasticityinaregressionmodel.Inonerespect,thisproblemisonlyminor.Theweightedleastsquaresestimator−1nnβˆ=wxxwxyiiiiiii=1i=1isconsistentregardlessoftheweightsused,aslongastheweightsareuncorrelatedwiththedisturbances.Butusingthewrongsetofweightshastwootherconsequencesthatmaybelessbenign.First,theimproperlyweightedleastsquaresestimatorisinefficient.Thispointmightbemootifthecorrectweightsareunknown,buttheGLSstandarderrorswillalsobeincorrect.Theasymptoticcovariancematrixoftheestimatorβˆ=[XV−1X]−1XV−1y(11-15)isAsy.Var[βˆ]=σ2[XV−1X]−1XV−1V−1X[XV−1X]−1.(11-16)Thisresultmayormaynotresembletheusualestimator,whichwouldbethematrixinbrackets,andunderscorestheusefulnessoftheWhiteestimatorin(11-9).ThestandardapproachintheliteratureistouseOLSwiththeWhiteestimatororsomevariantfortheasymptoticcovariancematrix.Onecouldarguebothflawsandvirtuesinthisapproach.Initsfavor,robustnesstounknownheteroscedasticityisacompellingvirtue.Intheclearpresenceofheteroscedasticity,however,leastsquarescanbeextremelyinefficient.Thequestionbecomeswhetherusingthewrongweightsisbetterthanusingnoweightsatall.Thereareseverallayerstothequestion.Ifweuseoneofthemodelsdiscussedearlier—Harvey’s,forexample,isaversatileandflexiblecandidate—thenwemayusethewrongsetofweightsand,inaddition,estimationof\nGreene-50240bookJune17,200216:21CHAPTER11✦Heteroscedasticity227thevarianceparametersintroducesanewsourceofvariationintotheslopeestimatorsforthemodel.Aheteroscedasticityrobustestimatorforweightedleastsquarescanbeformedbycombining(11-16)withtheWhiteestimator.Theweightedleastsquaresestimatorin(11-15)isconsistentwithanysetofweightsV=diag[v1,v2,...,vn].Itsasymptoticcovariancematrixcanbeestimatedwithne2Est.Asy.Var[βˆ]=(XV−1X)−1ixx(XV−1X)−1.(11-17)2iivii=1Anyconsistentestimatorcanbeusedtoformtheresiduals.Theweightedleastsquaresestimatorisanaturalcandidate.11.6ESTIMATIONWHENCONTAINSUNKNOWNPARAMETERSThegeneralformoftheheteroscedasticregressionmodelhastoomanyparameterstoestimatebyordinarymethods.Typically,themodelisrestrictedbyformulatingσ2asafunctionofafewparameters,asinσ2=σ2xαorσ2=σ2[xα]2.Writethisas(α).iiiiFGLSbasedonaconsistentestimatorof(α)(meaningaconsistentestimatorofα)isasymptoticallyequivalenttofullGLS,andFGLSbasedonamaximumlikelihoodestimatorof(α)willproduceamaximumlikelihoodestimatorofβif(α)doesnotcontainanyelementsofβ.Thenewproblemisthatwemustfirstfindconsistentestimatorsoftheunknownparametersin(α).Twomethodsaretypicallyused,two-stepGLSandmaximumlikelihood.11.6.1TWO-STEPESTIMATIONFortheheteroscedasticmodel,theGLSestimatorisn−1n11βˆ=xxxy.(11-18)2ii2iiσiσii=1i=1Thetwo-stepestimatorsarecomputedbyfirstobtainingestimatesσˆ2,usuallyusingiˆˆ2somefunctionoftheordinaryleastsquaresresiduals.Then,βuses(11-18)andσˆi.Theordinaryleastsquaresestimatorofβ,althoughinefficient,isstillconsistent.Assuch,statisticscomputedusingtheordinaryleastsquaresresiduals,e=(y−xb),williiihavethesameasymptoticpropertiesasthosecomputedusingthetruedisturbances,ε=(y−xβ).Thisresultsuggestsaregressionapproachforthetruedisturbancesandiiivariableszthatmayormaynotcoincidewithx.NowE[ε2|z]=σ2,soiiiiiε2=σ2+v,iiiwherevisjustthedifferencebetweenε2anditsconditionalexpectation.Sinceεisiiiunobservable,wewouldusetheleastsquaresresidual,forwhiche=ε−x(b−β)=iiipε+u.Then,e2=ε2+u2+2εu.But,inlargesamples,asb−→β,termsinuwilliiiiiiii\nGreene-50240bookJune17,200216:21228CHAPTER11✦Heteroscedasticitybecomenegligible,sothatatleastapproximately,15e2=σ2+v∗.iiiTheproceduresuggestedistotreatthevariancefunctionasaregressionandusethesquaresorsomeotherfunctionsoftheleastsquaresresidualsasthedependentvari-able.16Forexample,ifσ2=zα,thenaconsistentestimatorofαwillbetheleastsquaresiislopes,a,inthe“model,”e2=zα+v∗.iiiInthismodel,v∗isbothheteroscedasticandautocorrelated,soaisconsistentbutiinefficient.But,consistencyisallthatisrequiredforasymptoticallyefficientestimationofβusing(αˆ).Itremainstobesettledwhetherimprovingtheestimatorofαinthisandtheothermodelswewillconsiderwouldimprovethesmallsamplepropertiesofthetwo-stepestimatorofβ.17Thetwo-stepestimatormaybeiteratedbyrecomputingtheresidualsaftercomput-ingtheFGLSestimatesandthenreenteringthecomputation.Theasymptoticpropertiesoftheiteratedestimatorarethesameasthoseofthetwo-stepestimator,however.Insomecases,thissortofiterationwillproducethemaximumlikelihoodestimatoratconvergence.Yetnoneoftheestimatorsbasedonregressionofsquaredresidualsonothervariablessatisfytherequirement.Thus,iterationinthiscontextprovideslittleadditionalbenefit,ifany.11.6.2MAXIMUMLIKELIHOODESTIMATION18Thelog-likelihoodfunctionforasampleofnormallydistributedobservationsisnn11lnL=−ln(2π)−lnσ2+(y−xβ)2.22i2iiσii=1Forsimplicity,let(11-19)σ2=σ2f(α),iiwhereαisthevectorofunknownparametersin(α)andfi(α)isindexedbyitoindicatethatitisafunctionofzi—notethat(α)=diag[fi(α)]soitisalso.Assumeaswellthatnoelementsofβappearinα.Thelog-likelihoodfunctionisnn111lnL=−[ln(2π)+lnσ2]−lnf(α)+(y−xβ)2.22iσ2f(α)iiii=1Forconvenienceinwhatfollows,substituteεfor(y−xβ),denotef(α)assimplyiiiifi,anddenotethevectorofderivatives∂fi(α)/∂αasgi.Then,thederivativesofthe15SeeAmemiya(1985)forformalanalysis.16See,forexample,JobsonandFuller(1980).17Fomby,Hill,andJohnson(1984,pp.177–186)andAmemiya(1985,pp.203–207;1977a)examinethismodel.18ThemethodofmaximumlikelihoodestimationisdevelopedinChapter17.\nGreene-50240bookJune17,200216:21CHAPTER11✦Heteroscedasticity229log-likelihoodfunctionare∂lnLnεi=xi∂βσ2fii=1n2n2∂lnLn1εi1εi=−+=−1(11-20)∂σ22σ22σ4fi2σ2σ2fii=1i=1n2∂lnL1εi1=−1gi.∂α2σ2fifii=1SinceE[ε|x,z]=0andE[ε2|x,z]=σ2f,itisclearthatallderivativeshaveiiiiiiiexpectationzeroasrequired.Themaximumlikelihoodestimatorsarethosevaluesofβ,σ2,andαthatsimultaneouslyequatethesederivativestozero.Thelikelihoodequationsaregenerallyhighlynonlinearandwillusuallyrequireaniterativesolution.LetGbethen×Mmatrixwithithrowequalto∂f/∂α=gandletidenoteaniin×1columnvectorof1s.Theasymptoticcovariancematrixforthemaximumlikelihoodestimatorinthismodelis2−1−1−1(1/σ)XX00∂2lnL−E=42−1,0n/(2σ)(1/(2σ))iG∂γ∂γ0(1/(2σ2))G−1i(1/2)G−2G(11-21)whereγ=[β,σ2,α].(Oneconvenienceisthattermsinvolving∂2f/∂α∂αfalloutofitheexpectations.Theproofisconsideredintheexercises.)Fromthelikelihoodequations,itisapparentthatforagivenvalueofα,thesolutionforβistheGLSestimator.Thescaleparameter,σ2,isultimatelyirrelevanttothissolution.Thesecondlikelihoodequationshowsthatforgivenvaluesofβandα,σ2will2nbeestimatedasthemeanofthesquaredgeneralizedresiduals,σˆ=(1/n)i=1[(yi−xβˆ)/fˆ]2.Thistermisthegeneralizedsumofsquares.Finally,thereisnogeneraliisolutiontobefoundfortheestimatorofα;itdependsonthemodel.Wewillexaminetwoexamples.Ifαisonlyasingleparameter,thenitmaybesimplestjusttoscanarangeofvaluesofαtolocatetheonethat,withtheassociatedFGLSestimatorofβ,maximizesthelog-likelihood.ThefactthattheHessianisblockdiagonaldoesprovideanadditionalconvenience.Theparametervectorβmayalwaysbeestimatedconditionallyon[σ2,α]and,likewise,ifβisgiven,thenthesolutionsforσ2andαcanbefoundconditionally,althoughthismaybeacomplicatedoptimizationproblem.But,bygoingbackandforthinthisfashion,assuggestedbyOberhoferandKmenta(1974),wemaybeabletoobtainthefullsolutionmoreeasilythanbyapproachingthefullsetofequationssimultaneously.11.6.3MODELBASEDTESTSFORHETEROSCEDASTICITYThetestsforheteroscedasticitydescribedinSection11.4arebasedonthebehavioroftheleastsquaresresiduals.Thegeneralapproachisbasedontheideathatifhet-eroscedasticityofanyformispresentinthedisturbances,itwillbediscernibleinthebehavioroftheresiduals.Thoseresidualbasedtestsarerobustinthesensethatthey\nGreene-50240bookJune17,200216:21230CHAPTER11✦Heteroscedasticitywilldetectheteroscedasticityofavarietyofforms.Ontheotherhand,theirpowerisafunctionofthespecificalternative.Themodelconsideredhereisfairlynarrow.Thetradeoffisthatwithinthecontextofthespecifiedmodel,atestofheteroscedasticitywillhavegreaterpowerthantheresidualbasedtests.(Tocomefullcircle,ofcourse,thatmeansthatifthemodelspecificationisincorrect,thetestsarelikelytohavelimitedornopoweratalltorevealanincorrecthypothesisofhomoscedasticity.)Testingthehypothesisofhomoscedasticityusinganyofthethreestandardmeth-odsisparticularlysimpleinthemodeloutlinedinthissection.Thetriooftestsforparametricmodelsisavailable.Themodelwouldgenerallybeformulatedsothattheheteroscedasticityisinducedbyanonzeroα.Thus,wetakethetestofH0:α=0tobeatestagainsthomoscedasticity.WaldTestTheWaldstatisticiscomputedbyextractingfromthefullparametervectoranditsestimatedasymptoticcovariancematrixthesubvectorαˆanditsasymptoticcovariancematrix.Then,−1W=αˆEst.Asy.Var[αˆ]αˆ.LikelihoodRatioTestTheresultsofthehomoscedasticleastsquaresregressionaregenerallyusedtoobtaintheinitialvaluesfortheiterations.Therestrictedlog-likelihoodvalueisaby-productoftheinitialsetup;log-L=−(n/2)[1+ln2π+ln(ee/n)].TheRunrestrictedlog-likelihood,log-LU,isobtainedastheobjectivefunctionfortheestima-tion.Then,thestatisticforthetestisLR=−2(ln-LR−ln-LU].LagrangeMultiplierTestTosetuptheLMtest,wereferbacktothemodelin(11-19)–(11-21).Attherestrictedestimatesα=0,β=b,σ2=ee/n(notn−K),f=1andi(0)=I.Thus,thefirstderivativesvectorevaluatedattheleastsquaresestimatesis∂lnL(β=b,σ2=ee/n,αˆ=0)=0∂β∂lnL(β=b,σ2=ee/n,αˆ=0)=0∂σ2n2n∂lnL(β=b,σ2=ee/n,αˆ=0)=1ei−1g=1vg.∂α2ee/ni2iii=1i=1ThenegativeexpectedinverseoftheHessian,from(11-21)is−1−1(1/σ2)XX00∂2lnL0n/(2σ4)[1/(2σ2)]g−1−E==−E[H]∂γ∂γα=00[1/(2σ2)]g(1/2)GGnnwhereg=i=1giandGG=i=1gigi.TheLMstatisticwillbe∂lnL−1∂lnLLM=(γ=b,ee/n,0)−E[H](γ=b,ee/n,0).∂γ∂γ\nGreene-50240bookJune17,200216:21CHAPTER11✦Heteroscedasticity231Withabitofalgebraandusing(B-66)forthepartitionedinverse,youcanshowthatthisreducesto−11nnnLM=vg(g−g¯)(g−g¯)vg.iiiiii2i=1i=1i=1Thisresult,asgivenbyBreuschandPagan(1980),issimplyonehalftimestheregressionsumofsquaresintheregressionofvionaconstantandgi.Thisactuallysimplifiesevenfurtherif,asinthecasesstudiedbyBrueschandPagan,thevariancefunctionisf=f(zα)wheref(z0)=1.Then,thederivativewillbeoftheformg=r(zα)zandiiiiiiitwillfollowthatr(z0)=aconstant.Inthisinstance,thesamestatisticwillresultfromiitheregressionofvionaconstantandziwhichistheresultreportedinSection11.4.3.Theremarkableaspectoftheresultisthatthesamestatisticresultsregardlessofthechoiceofvariancefunction,solongasitsatisfiesf=f(zα)wheref(z0)=1.TheiiimodelstudiedbyHarvey,forexamplehasf=exp(zα),sog=zwhenα=0.iiiiExample11.4Two-StepEstimationofaHeteroscedasticRegressionTable11.2listsweightedleastsquaresandtwo-stepFGLSestimatesoftheparametersoftheregressionmodelinExample11.1usingvariousformulationsofthescedasticfunction.Themethodusedtocomputetheweightsforweightedleastsquaresisgivenbeloweachmodelformulation.Theprocedurewasiteratedtoconvergenceforthemodelσ2=σ2zα—iiconvergencerequired13iterations.(Thetwo-stepestimatesarethosecomputedbythefirstiteration.)MLestimatesforthismodelarealsoshown.Asoftenhappens,theiterationpro-ducesfairlylargechangesintheestimates.Thereisalsoaconsiderableamountofvariationproducedbythedifferentformulations.Forthemodelf=zα,theconcentratedlog-likelihoodissimpletocompute.Wecanfindiithemaximumlikelihoodestimateforthismodeljustbyscanningoverarangeofvaluesforα.Foranyα,themaximumlikelihoodestimatorofβisweightedleastsquares,withweightsw=1/zα.Forourexpendituremodel,weuseincomeforz.Figure11.2showsaplotoftheiiilog-likelihoodfunction.Themaximumoccursatα=3.65.Thisvalue,withtheFGLSestimatesofβ,isshowninTable11.2.TABLE11.2Two-StepandWeightedLeastSquaresEstimatesConstantAgeOwnRentIncomeIncome2σ2=σ2(OLS)est.−237.15−3.081827.941234.35−14.997is.e.199.355.514782.92280.3667.4693σ2=σ2I(WLS)est.−181.87−2.935050.494202.17−12.114iis.e.165.524.603369.87976.7818.2731σ2=σ2I2(WLS)est.−114.11−2.694260.449158.43−7.2492iis.e.139.693.807458.55176.3929.7243σ2=σ2exp(zα)est.−117.88−1.233750.950145.30−7.9383ii(lne2onz=(1,lnI))s.e.101.392.551252.81446.3633.7367iiiσ2=σ2zα(2Step)est.−193.33−2.957947.357208.86−12.769ii(lne2on(1,lnz))s.e.171.084.762772.13977.1988.0838ii(iterated)est.−130.38−2.775459.126169.74−8.5995(α=1.7623)s.e.145.033.981761.043476.1809.3133(ML)est.−19.929−1.705858.10275.9704.3915(α=3.6513)s.e.113.062.758143.508481.04013.433\nGreene-50240bookJune17,200216:21232CHAPTER11✦Heteroscedasticity480490LOGLHREG50051010123456ALPHAFIGURE11.2PlotofLog-LikelihoodFunction.Notethatthisvalueofαisverydifferentfromthevalueweobtainedbyiterativeregressionofthelogsofthesquaredresidualsonlogincome.Inthismodel,gi=filnzi.Ifweinsertthisintotheexpressionfor∂lnL/∂αandmanipulateitabit,weobtaintheimplicitsolutionnε2iσ2zα−1lnzi=0.ii=1(The1disappearsfromthesolution.)Forgivenvaluesofσ2andβ,thisresultprovidesonly2animplicitsolutionforα.Inthenextsection,weexamineamethodforfindingasolution.Atthispoint,wenotethatthesolutiontothisequationisclearlynotobtainedbyregressionofthelogsofthesquaredresidualsonlogzi.Hence,thestrategyweusedforthetwo-stepestimatordoesnotseekthemaximumlikelihoodestimator.11.7APPLICATIONSThissectionwillpresenttwocommonapplicationsoftheheteroscedasticregressionmodel,Harvey’smodelofmultiplicativeheteroscedasticityandamodelofgroupwiseheteroscedasticitythatextendstothedisturbancevariancesomeconceptsthatareusu-allyassociatedwithvariationintheregressionfunction.11.7.1MULTIPLICATIVEHETEROSCEDASTICITYHarvey’s(1976)modelofmultiplicativeheteroscedasticityisaveryflexible,generalmodelthatincludesmostoftheusefulformulationsasspecialcases.Thegeneralfor-mulationisσ2=σ2exp(zα).ii\nGreene-50240bookJune17,200216:21CHAPTER11✦Heteroscedasticity233ThemodelexaminedinExample11.4haszi=lnincomei.Moregenerally,amodelwithheteroscedasticityoftheformMσ2=σ2zαmiimm=1resultsifthelogsofthevariablesareplacedinzi.Thegroupwiseheteroscedasticitymodeldescribedbelowisproducedbymakingziasetofgroupdummyvariables(onemustbeomitted).Inthiscase,σ2isthedisturbancevarianceforthebasegroupwhereasfortheothergroups,σ2=σ2exp(α).ggWebeginwithausefulsimplification.Letzincludeaconstanttermsothatz=ii[1,q],whereqistheoriginalsetofvariables,andletγ=[lnσ2,α].Then,themodeliiissimplyσ2=exp(γz).Oncethefullparametervectorisestimated,exp(γ)providesii1theestimatorofσ2.(Thisestimatorusestheinvarianceresultformaximumlikelihoodestimation.SeeSection17.4.5.d.)Thelog-likelihoodisn1n1nε2lnL=−ln(2π)−lnσ2−i22i2σ2i=1i=1in1n1nε2=−ln(2π)−zγ−i.22i2exp(zγ)i=1i=1iThelikelihoodequationsare∂lnLnεi−1=xi=Xε=0,∂βexp(ziγ)i=1n2∂lnL1εi=zi−1=0.∂γ2exp(ziγ)i=1Forthismodel,themethodofscoringturnsouttobeaparticularlyconvenientwaytomaximizethelog-likelihoodfunction.ThetermsintheHessianare∂2lnLn1=−xx=−X−1X,ii∂β∂βexp(ziγ)i=1∂2lnLnεi=−xizi,∂β∂γexp(ziγ)i=1∂2lnL1nε2=−izz.∂γ∂γ2exp(zγ)iii=1iTheexpectedvalueof∂2lnL/∂β∂γis0sinceE[ε|x,z]=0.Theexpectedvalueofiiithefractionin∂2lnL/∂γ∂γisE[ε2/σ2|x,z]=1.Letδ=[β,γ].Theniiii∂2lnLX−1X0−E==−H.∂δ∂δ01ZZ2\nGreene-50240bookJune17,200216:21234CHAPTER11✦HeteroscedasticityThescoringmethodisδ=δ−H−1g,t+1tttwhereδt(i.e.,βt,γt,andt)istheestimateatiterationt,gtisthetwo-partvectoroffirstderivatives[∂lnL/∂β,∂lnL/∂γ]andHispartitionedlikewise.SinceHisblockttttdiagonal,theiterationcanbewrittenasseparateequations:β=β+(X−1X)−1(X−1ε)t+1tttt=β+(X−1X)−1X−1(y−Xβ)tttt=(X−1X)−1X−1y(ofcourse).ttTherefore,theupdatedcoefficientvectorβt+1iscomputedbyFGLSusingthepreviouslycomputedestimateofγtocompute.Weusethesameapproachforγ:n1ε2γ=γ+[2(ZZ)−1]zi−1.t+1ti2exp(ziγ)i=1The2and1cancel.Theupdatedvalueofγiscomputedbyaddingthevectorofslopes2intheleastsquaresregressionof[ε2/exp(zγ)−1]onztotheoldone.Notethatiiithecorrectionis2(ZZ)−1Z(∂lnL/∂γ),soconvergenceoccurswhenthederivativeiszero.Theremainingdetailistodeterminethestartingvaluefortheiteration.Sinceanyconsistentestimatorwilldo,thesimplestprocedureistouseOLSforβandtheslopesinaregressionofthelogsofthesquaresoftheleastsquaresresidualsonziforγ.Harvey(1976)showsthatthismethodwillproduceaninconsistentestimatorofγ=lnσ2,1buttheinconsistencycanbecorrectedjustbyadding1.2704tothevalueobtained.19Thereafter,theiterationissimply:1.Estimatethedisturbancevarianceσ2withexp(γz).iti2.ComputeβbyFGLS.20t+13.Updateγtusingtheregressiondescribedintheprecedingparagraph.4.Computedt+1=[βt+1,γt+1]−[βt,γt].Ifdt+1islarge,thenreturntostep1.Ifdt+1atstep4issufficientlysmall,thenexittheiteration.Theasymptoticcovariancematrixissimply−H−1,whichisblockdiagonalwithblocksAsy.Var[βˆ]=(X−1X)−1,MLAsy.Var[γˆ]=2(ZZ)−1.MLIfdesired,thenσˆ2=exp(γˆ)canbecomputed.Theasymptoticvariancewouldbe1[exp(γ)]2(Asy.Var[γˆ]).11,ML19Healsopresentsacorrectionfortheasymptoticcovariancematrixforthisfirststepestimatorofγ.20Thetwo-stepestimatorobtainedbystoppingherewouldbefullyefficientifthestartingvalueforγwereconsistent,butitwouldnotbethemaximumlikelihoodestimator.\nGreene-50240bookJune17,200216:21CHAPTER11✦Heteroscedasticity235TABLE11.3MultiplicativeHeteroscedasticityModelConstantAgeOwnRentIncomeIncome2OrdinaryLeastSquaresEstimatesCoefficient−237.15−3.081827.941234.35−14.997Standarderror199.355.514782.92280.3667.469tratio−1.1−0.5590.3372.916−2.008R2=0.243578,s=284.75080,Ln-L=−506.488MaximumLikelihoodEstimates(standarderrorsforestimatesofγinparentheses)Coefficient−58.437−0.3760733.35896.823−3.3008Standarderror62.0980.5500037.13531.7982.6248tratio−0.941−0.6840.8983.045−1.448[exp(c)]1/2=0.9792(0.79115),c=5.355(0.37504),c=−0.56315(0.036122)123Ln-L=−465.9817,Wald=251.423,LR=81.0142,LM=115.899Example11.5MultiplicativeHeteroscedasticityEstimatesoftheregressionmodelofExample11.1basedonHarvey’smodelareshowninTable11.3withtheordinaryleastsquaresresults.Thescedasticfunctionis22σi=expγ1+γ2incomei+γ3incomei.TheestimatesareconsistentwiththeearlierresultsinsuggestingthatIncomeanditssquaresignificantlyexplainvariationinthedisturbancevariancesacrossobservations.The95per-centcriticalvalueforachi-squaredtestwithtwodegreesoffreedomis5.99,soallthreeteststatisticsleadtorejectionofthehypothesisofhomoscedasticity.11.7.2GROUPWISEHETEROSCEDASTICITYAgroupwiseheteroscedasticregressionhasstructuralequationsy=xβ+ε,i=1,...,n,iiiE[εi|xi]=0,i=1,...,n.ThenobservationsaregroupedintoGgroups,eachwithngobservations.Theslopevectoristhesameinallgroups,butwithingroupg:Var[ε|x]=σ2,i=1,...,n.igigggIfthevariancesareknown,thentheGLSestimatoris−1G1G1βˆ=XXXy.(11-22)σ2ggσ2ggg=1gg=1gSinceXy=XXb,wherebistheOLSestimatorinthegthsubsetofobservations,gggggg−1−1G1G1GGGβˆ=XXXXb=VVb=Wb.σ2ggσ2ggggggggg=1gg=1gg=1g=1g=1ThisresultisamatrixweightedaverageoftheGleastsquaresestimators.TheweightingG−1−1−1matricesareWg=g=1Var[bg]Var[bg].Theestimatorwiththesmaller\nGreene-50240bookJune17,200216:21236CHAPTER11✦Heteroscedasticitycovariancematrixthereforereceivesthelargerweight.(IfXgisthesameineverygroup,thenthematrixWreducestothesimplescalar,w=h/hwhereh=1/σ2.)gggggggTheprecedingisausefulconstructionoftheestimator,butitreliesonanalgebraicresultthatmightbeunusable.Ifthenumberofobservationsinanygroupissmallerthanthenumberofregressors,thenthegroupspecificOLSestimatorcannotbecomputed.But,ascanbeseenin(11-22),thatisnotwhatisneededtoproceed;whatisneededaretheweights.Asalways,pooledleastsquaresisaconsistentestimator,whichmeansthatusingthegroupspecificsubvectorsoftheOLSresiduals,ee2ggσˆg=(11-23)ngprovidestheneededestimatorforthegroupspecificdisturbancevariance.Thereafter,(11-22)istheestimatorandtheinversematrixinthatexpressiongivestheestimatoroftheasymptoticcovariancematrix.Continuingthislineofreasoning,onemightconsideriteratingtheestimatorbyreturningto(11-23)withthetwo-stepFGLSestimator,recomputingtheweights,thenreturningto(11-22)torecomputetheslopevector.Thiscanbecontinueduntilconver-gence.Itcanbeshown[seeOberhoferandKmenta(1974)]thatsolongas(11-23)isusedwithoutadegreesoffreedomcorrection,thenifthisdoesconverge,itwilldosoatthemaximumlikelihoodestimator(withnormallydistributeddisturbances).AnothermethodofestimatingthismodelistotreatitasaformofHarvey’smodelofmultiplica-tiveheteroscedasticitywhereziisaset(minusone)ofgroupdummyvariables.Fortestingthehomoscedasticityassumptioninthismodel,onecanusealikelihoodratiotest.Thelog-likelihoodfunction,assuminghomoscedasticity,islnL=−(n/2)[1+ln2π+ln(ee/n)]0wheren=gngisthetotalnumberofobservations.UnderthealternativehypothesisofheteroscedasticityacrossGgroups,thelog-likelihoodfunctionisGGngn11lnL=−ln(2π)−nlnσ2−ε2/σ2.(11-24)1ggigg222g=1g=1i=1Themaximumlikelihoodestimatorsofσ2andσ2areee/nandσˆ2from(11-23),respec-ggtively.TheOLSandmaximumlikelihoodestimatorsofβareusedfortheslopevectorunderthenullandalternativehypothesis,respectively.IfweevaluatelnL0andlnL1attheseestimates,thenthelikelihoodratioteststatisticforhomoscedasticityisG−2(lnL−lnL)=nlns2−nlns2.01ggg=1Underthenullhypothesis,thestatistichasalimitingchi-squareddistributionwithG−1degreesoffreedom.Example11.6HeteroscedasticCostFunctionforAirlineProductionToillustratethecomputationsforthegroupwiseheteroscedasticmodel,wewillreexaminethecostmodelforthetotalcostofproductionintheairlineindustrythatwasfitinExample7.2.\nGreene-50240bookJune17,200216:21CHAPTER11✦Heteroscedasticity237TABLE11.4LeastSquaresandMaximumLikelihoodEstimatesofaGroupwiseHeteroscedasticityModelLeastSquares:HomoscedasticMaximumLikelihoodEstimateStd.ErrortRatioEstimateStd.ErrortRatioβ19.7060.19350.2510.0570.13474.86β20.418−.015227.470.4000.010837.12β3−1.0700.202−5.30−1.1290.164−7.87β40.9190.029930.760.9280.022840.86δ2−0.04120.0252−1.64−0.04870.0237−2.06δ3−0.2090.0428−4.88−0.2000.0308−6.49δ40.1850.06083.040.1920.04993.852δ50.02410.07990.300.04190.05940.71δ60.08710.08421.030.09630.06311.572γ1−7.0880.365−19.41γ22.0070.5163.89γ30.7580.5161.47γ42.2390.5164.62γ50.5300.5161.03γ61.0530.5162.04σ20.0014790.00083491σ20.0049350.0062122σ20.0018880.0017813σ20.0058340.0090714σ20.0023380.0014195σ20.0030320.0023936R2=0.997,s2=0.003613,lnL=130.0862lnL=140.7591(Adescriptionofthedataappearsintheearlierexample.)Forasampleofsixairlinesobservedannuallyfor15years,wefitthecostfunctionlncostit=β1+β2lnoutputit+β3loadfactorit+β4lnfuelpriceitδ2Firm2+δ3Firm3+δ4Firm4+δ5Firm5+δ6Firm6+εit.Outputismeasuredin“revenuepassengermiles.”Theloadfactorisarateofcapacityutilization;itistheaveragerateatwhichseatsontheairline’splanesarefilled.Morecompletemodelsofcostsincludeotherfactorprices(materials,capital)and,perhaps,aquadraticterminlogoutputtoallowforvariableeconomiesofscale.The“firmj”termsarefirmspecificdummyvariables.OrdinaryleastsquaresregressionproducesthesetofresultsattheleftsideofTable11.4.Thevarianceestimatesshownatthebottomofthetablearethefirmspecificvarianceestimatesin(11-23).Theresultssofararewhatonemightexpect.Therearesubstantialeconomiesofscale;e.s.it=(1/0.919)−1=0.088.Thefuelpriceandloadfactorsaffectcostsinthepredictablefashionsaswell.(Fuelpricesdifferbecauseofdifferentmixesoftypesandregionaldifferencesinsupplycharacteristics.)Thesecondsetofresultsshowsthemodelofgroupwiseheteroscedasticity.Fromtheleastsquaresvarianceestimatesinthefirstsetofresults,whicharequitedifferent,onemightguessthatatestofhomoscedasticitywouldleadtorejectionofthehypothesis.Theeasiestcomputationisthelikelihoodratiotest.Basedontheloglikelihoodfunctionsinthelastrowofthetable,theteststatistic,whichhasalimitingchi-squareddistributionwith5degreesoffreedom,equals21.3458.Thecriticalvaluefromthetableis11.07,sothehypothesisofhomoscedasticityisrejected.\nGreene-50240bookJune17,200216:21238CHAPTER11✦Heteroscedasticity11.8AUTOREGRESSIVECONDITIONALHETEROSCEDASTICITYHeteroscedasticityisoftenassociatedwithcross-sectionaldata,whereastimeseriesareusuallystudiedinthecontextofhomoscedasticprocesses.Inanalysesofmacroeconomicdata,Engle(1982,1983)andCragg(1982)foundevidencethatforsomekindsofdata,thedisturbancevariancesintime-seriesmodelswerelessstablethanusuallyassumed.Engle’sresultssuggestedthatinmodelsofinflation,largeandsmallforecasterrorsappearedtooccurinclusters,suggestingaformofheteroscedasticityinwhichthevarianceoftheforecasterrordependsonthesizeofthepreviousdisturbance.Hesuggestedtheautoregressive,conditionallyheteroscedastic,orARCH,modelasanalternativetotheusualtime-seriesprocess.Morerecentstudiesoffinancialmarketssuggestthatthephenomenonisquitecommon.TheARCHmodelhasproventobeusefulinstudyingthevolatilityofinflation[CoulsonandRobins(1985)],thetermstructureofinterestrates[Engle,Hendry,andTrumbull(1985)],thevolatilityofstockmarketreturns[Engle,Lilien,andRobins(1987)],andthebehaviorofforeignexchangemarkets[DomowitzandHakkio(1985)andBollerslevandGhysels(1996)],tonamebutafew.Thissectionwilldescribespecification,estimation,andtesting,inthebasicformulationsoftheARCHmodelandsomeextensions.21Example11.7StochasticVolatilityFigure11.3showsBollerslevandGhysel’s1974dataonthedailypercentagenominalreturnfortheDeutschmark/Poundexchangerate.(ThesedataaregiveninAppendixTableF11.1.)Thevariationintheseriesappearstobefluctuating,withseveralclustersoflargeandsmallmovements.11.8.1THEARCH(1)MODELThesimplestformofthismodelistheARCH(1)model,yt=βxt+εt(11-25)ε=uα+αε2,tt01t−1whereuisdistributedasstandardnormal.22ItfollowsthatE[ε|x,ε]=0,sothattttt−1E[εt|xt]=0andE[yt|xt]=βxt.Therefore,thismodelisaclassicalregressionmodel.ButVar[ε|ε]=Eε2ε=Eu2α+αε2=α+αε2,tt−1tt−1t01t−101t−1soεtisconditionallyheteroscedastic,notwithrespecttoxtasweconsideredintheprecedingsections,butwithrespecttoεt−1.TheunconditionalvarianceofεtisVar[ε]=VarE[ε|ε]+EVar[ε|ε]=α+αEε2=α+αVar[ε].ttt−1tt−101t−101t−121EngleandRothschild(1992)givearecentsurveyofthisliteraturewhichdescribesmanyextensions.Mills(1993)alsopresentsseveralapplications.See,aswell,Bollerslev(1986)andLi,Ling,andMcAleer(2001).SeeMcCulloughandRenfro(1999)fordiscussionofestimationofthismodel.22Theassumptionthatuthasunitvarianceisnotarestriction.Thescalingimpliedbyanyothervariancewouldbeabsorbedbytheotherparameters.\nGreene-50240bookJune17,200216:21CHAPTER11✦Heteroscedasticity2394321Y01230395790118515801975Observ.#FIGURE11.3NominalExchangeRateReturns.Iftheprocessgeneratingthedisturbancesisweakly(covariance)stationary(seeDefi-nition12.2),23thentheunconditionalvarianceisnotchangingovertimesoα0Var[εt]=Var[εt−1]=α0+α1Var[εt−1]=.1−α1Forthisratiotobefiniteandpositive,|α1|mustbelessthan1.Then,unconditionally,εisdistributedwithmeanzeroandvarianceσ2=α/(1−α).Therefore,themodelt01obeystheclassicalassumptions,andordinaryleastsquaresisthemostefficientlinearunbiasedestimatorofβ.Butthereisamoreefficientnonlinearestimator.Thelog-likelihoodfunctionforthismodelisgivenbyEngle(1982).Conditionedonstartingvaluesy0andx0(andε0),theconditionallog-likelihoodforobservationst=1,...,TistheoneweexaminedinSection11.6.2forthegeneralheteroscedasticregressionmodel[see(11-19)],T1T1Tε2lnL=−ln(2π)−lnα+αε2−t,ε=y−βx.2201t−122tttt=1t=1α0+α1εt−1(11-26)MaximizationoflogLcanbedonewiththeconventionalmethods,asdiscussedinAppendixE.2423ThisdiscussionwilldrawontheresultsandterminologyoftimeseriesanalysisinSection12.3andChapter20.Thereadermaywishtoperusethismaterialatthispoint.24Engle(1982)andJudgeetal.(1985,pp.441–444)suggestafour-stepprocedurebasedonthemethodofscoringthatresemblesthetwo-stepmethodweusedforthemultiplicativeheteroscedasticitymodelinSec-tion11.6.However,thefullMLEisnowincorporatedinmostmodernsoftware,sothesimpleregressionbasedmethods,whicharedifficulttogeneralize,arelessattractiveinthecurrentliterature.But,seeMcCulloughandRenfro(1999)andFiorentini,CalzolariandPanattoni(1996)forcommentaryandsomecautionsrelatedtomaximumlikelihoodestimation.\nGreene-50240bookJune17,200216:21240CHAPTER11✦Heteroscedasticity11.8.2ARCH(q),ARCH-IN-MEANANDGENERALIZEDARCHMODELSThenaturalextensionoftheARCH(1)modelpresentedbeforeisamoregeneralmodelwithlongerlags.TheARCH(q)process,σ2=α+αε2+αε2+···+αε2,t01t−12t−2qt−qisaqthordermovingaverage[MA(q)]process.(Muchoftheanalysisofthemodelpar-allelstheresultsinChapter20formoregeneraltimeseriesmodels.)[Onceagain,seeEngle(1982).]ThissectionwillgeneralizetheARCH(q)model,assuggestedbyBoller-slev(1986),inthedirectionoftheautoregressive-movingaverage(ARMA)modelsofSection20.2.1.Thediscussionwillparallelhisdevelopment,althoughmanydetailsareomittedforbrevity.Thereaderisreferredtothatpaperforbackgroundandforsomeofthelesscriticaldetails.Thecapitalassetpricingmodel(CAPM)isdiscussedbrieflyinChapter14.AmongthemanyvariantsofthismodelisanintertemporalformulationbyMerton(1980)thatsuggestsanapproximatelinearrelationshipbetweenthereturnandvarianceofthemarketportfolio.Oneofthepossibleflawsinthismodelisitsassumptionofacon-stantvarianceofthemarketportfolio.Inthisconnection,then,theARCH-in-Mean,orARCH-M,modelsuggestedbyEngle,Lilien,andRobins(1987)isanaturalextension.Themodelstatesthaty=βx+δσ2+ε,ttttVar[εt|t]=ARCH(q).Amongtheinterestingimplicationsofthismodificationofthestandardmodelisthatundercertainassumptions,δisthecoefficientofrelativeriskaversion.TheARCH-Mmodelhasbeenappliedinawidevarietyofstudiesofvolatilityinassetreturns,includingthedailyStandardandPoor’sIndex[French,Schwert,andStambaugh(1987)]andweeklyNewYorkStockExchangereturns[Chou(1988)].AlengthylistofapplicationsisgiveninBollerslev,Chou,andKroner(1992).TheARCH-Mmodelhasseveralnoteworthystatisticalcharacteristics.Unlikethestandardregressionmodel,misspecificationofthevariancefunctiondoesimpactontheconsistencyofestimatorsoftheparametersofthemean.[SeePaganandUllah(1988)forformalanalysisofthispoint.]Recallthatintheclassicalregressionsetting,weightedleastsquaresisconsistenteveniftheweightsaremisspecifiedaslongastheweightsareuncorrelatedwiththedisturbances.Thatisnottruehere.IftheARCHpartofthemodelismisspecified,thenconventionalestimatorsofβandδwillnotbeconsistent.Bollerslev,Chou,andKroner(1992)listalargenumberofstudiesthatcalledintoquestionthespecificationoftheARCH-Mmodel,andtheysubsequentlyobtainedquitedifferentresultsafterrespecifyingthemodel.Acloselyrelatedpracticalproblemisthatthemeanandvarianceparametersinthismodelarenolongeruncorrelated.Inanalysisuptothispoint,wemadequiteprofitableuseoftheblockdiagonalityoftheHessianofthelog-likelihoodfunctionforthemodelofheteroscedasticity.ButtheHessianfortheARCH-Mmodelisnotblockdiagonal.Inpracticalterms,theestimationproblemcannotbesegmentedaswehavedonepreviouslywiththeheteroscedasticregressionmodel.Alltheparametersmustbeestimatedsimultaneously.\nGreene-50240bookJune17,200216:21CHAPTER11✦Heteroscedasticity241Themodelofgeneralizedautoregressiveconditionalheteroscedasticity(GARCH)isdefinedasfollows.25Theunderlyingregressionistheusualonein(11-25).Conditionedonaninformationsetattimet,denotedt,thedistributionofthedisturbanceisassumedtobeε|∼N0,σ2,tttwheretheconditionalvarianceisσ2=α+δσ2+δσ2+···+δσ2+αε2+αε2+···+αε2.(11-27)t01t−12t−2pt−p1t−12t−2qt−qDefinez=1,σ2,σ2,...,σ2,ε2,ε2,...,ε2tt−1t−2t−pt−1t−2t−qandγ=[α,δ,δ,...,δ,α,...,α]=[α,δ,α].012p1q0Thenσ2=γz.ttNoticethattheconditionalvarianceisdefinedbyanautoregressive-movingaverage[ARMA(p,q)]processintheinnovationsε2,exactlyasinSection20.2.1.Thedifferencethereisthatthemeanoftherandomvariableofinterestytisdescribedcompletelybyaheteroscedastic,butotherwiseordinary,regressionmodel.Theconditionalvariance,however,evolvesovertimeinwhatmightbeaverycomplicatedmanner,dependingontheparametervaluesandonpandq.Themodelin(11-27)isaGARCH(p,q)model,whereprefers,asbefore,totheorderoftheautoregressivepart.26AsBollerslev(1986)demonstrateswithanexample,thevirtueofthisapproachisthataGARCHmodelwithasmallnumberoftermsappearstoperformaswellasorbetterthananARCHmodelwithmany.ThestationarityconditionsdiscussedinSection20.2.2areimportantinthiscontexttoensurethatthemomentsofthenormaldistributionarefinite.Thereasonisthathighermomentsofthenormaldistributionarefinitepowersofthevariance.Anormaldistributionwithvarianceσ2hasfourthmoment3σ4,sixthmoment15σ6,andsoon.ttt[Thepreciserelationshipoftheevenmomentsofthenormaldistributiontothevari-anceisµ=(σ2)k(2k)!/(k!2k).]Simplyensuringthatσ2isstabledoesnotensurethat2kthigherpowersareaswell.27Bollerslevpresentsausefulfigurethatshowstheconditionsneededtoensurestabilityformomentsuptoorder12foraGARCH(1,1)modelandgivessomeadditionaldiscussion.Forexample,foraGARCH(1,1)process,forthefourthmomenttoexist,3α2+2αδ+δ2mustbelessthan1.111125Ashavemostareasintime-serieseconometrics,thelineofliteratureonGARCHmodelshasprogressedrapidlyinrecentyearsandwillsurelycontinuetodoso.WehavepresentedBollerslev’smodelinsomedetail,despitemanyrecentextensions,notonlytointroducethetopicasabridgetotheliterature,butalsobecauseitprovidesaconvenientandinterestingsettinginwhichtodiscussseveralrelatedtopicssuchasdouble-lengthregressionandpseudo–maximumlikelihoodestimation.26WehavechangedBollerslev’snotationslightlysoasnottoconflictwithourpreviouspresentation.Heusedβinsteadofourδin(18-25)andbinsteadofourβin(18-23).27Theconditionscannotbeimposedapriori.Infact,thereisnononzerosetofparametersthatguaranteesstabilityofallmoments,eventhoughthenormaldistributionhasfinitemomentsofallorders.Assuch,thenormalityassumptionmustbeviewedasanapproximation.\nGreene-50240bookJune17,200216:21242CHAPTER11✦HeteroscedasticityItisconvenienttowrite(11-27)intermsofpolynomialsinthelagoperator:σ2=α+D(L)σ2+A(L)ε2.t0ttAsdiscussedinSection20.2.2,thestationarityconditionforsuchanequationisthattherootsofthecharacteristicequation,1−D(z)=0,mustlieoutsidetheunitcircle.Forthepresent,wewillassumethatthiscaseistrueforthemodelweareconsideringandthatA(1)+D(1)<1.[Thisassumptionisstrongerthanthatneededtoensurestationarityinahigher-orderautoregressivemodel,whichwoulddependonlyonD(L).]TheimplicationisthattheGARCHprocessiscovariancestationarywithE[εt]=0(unconditionally),Var[εt]=α0/[1−A(1)−D(1)],andCov[εt,εs]=0forallt=s.Thus,unconditionallythemodelistheclassicalregressionmodelthatweexaminedinChapters2–8.TheusefulnessoftheGARCHspecificationisthatitallowsthevariancetoevolveovertimeinawaythatismuchmoregeneralthanthesimplespecificationoftheARCHmodel.Thecomparisonbetweensimplefinite-distributedlagmodelsandthedynamicregressionmodeldiscussedinChapter19isanalogous.Fortheexamplediscussedinhispaper,BollerslevreportsthatalthoughEngleandKraft’s(1983)ARCH(8)modelfortherateofinflationintheGNPdeflatorappearstoremoveallARCHeffects,acloserlookrevealsGARCHeffectsatseverallags.ByfittingaGARCH(1,1)modeltothesamedata,BollerslevfindsthattheARCHeffectsouttothesameeight-periodlagasfitbyEngleandKraftandhisobservedGARCHeffectsareallsatisfactorilyaccountedfor.11.8.3MAXIMUMLIKELIHOODESTIMATIONOFTHEGARCHMODELBollerslevdescribesamethodofestimationbasedontheBHHHalgorithm.Asheshows,themethodisrelativelysimple,althoughwiththelinesearchandfirstderivativemethodthathesuggests,itprobablyinvolvesmorecomputationandmoreiterationsthannecessary.FollowingthesuggestionsofHarvey(1976),itturnsoutthatthereisasimplerwaytoestimatetheGARCHmodelthatisalsoveryilluminating.ThismodelisactuallyverysimilartothemoreconventionalmodelofmultiplicativeheteroscedasticitythatweexaminedinSection11.7.1.Fornormallydistributeddisturbances,thelog-likelihoodforasampleofTobser-vationsisT2TT12εt28lnL=−ln(2π)+lnσt+2=lnft(θ)=lt(θ),2σtt=1t=1t=1whereε=y−xβandθ=(β,α,α,δ)=(β,γ).DerivativesoflnLareobtainedttt0bysummation.Letltdenotelnft(θ).Thefirstderivativeswithrespecttothevarianceparametersare∂l11ε2∂σ211∂σ2ε211ttttt∂γ=−2σ2−22∂γ=2σ2∂γσ2−1=2σ2gtvt=btvt.tσtttt(11-28)28TherearethreeminorerrorsinBollerslev’sderivationthatwenoteheretoavoidtheapparentinconsis-1h1−1−2−2tencies.Inhis(22),2tshouldbe2ht.In(23),−2htshouldbe−ht.In(28),h∂h/∂ωshould,ineachcase,be(1/h)∂h/∂ω.[Inhis(8),α0α1shouldbeα0+α1,butthishasnoimplicationsforourderivation.]\nGreene-50240bookJune17,200216:21CHAPTER11✦Heteroscedasticity243NotethatE[vt]=0.Suppose,fornow,thattherearenoregressionparameters.Newton’smethodforestimatingthevarianceparameterswouldbeγˆi+1=γˆi−H−1g,(11-29)whereHindicatestheHessianandgisthefirstderivativesvector.FollowingHarvey’ssuggestion(seeSection11.7.1),wewillusethemethodofscoringinstead.Todothis,wemakeuseofE[v]=0andE[ε2/σ2]=1.Aftertakingexpectationsin(11-28),theiter-ttt√√ationreducestoalinearregressionofv=(1/2)vonregressorsw=(1/2)g/σ2.∗tt∗tttThatis,∂lnLγˆi+1=γˆi+[WW]−1Wv=γˆi+[WW]−1,(11-30)∗∗∗∗∗∗∂γwhererowtofWisw.Theiterationhasconvergedwhentheslopevectoriszero,∗∗twhichhappenswhenthefirstderivativevectoriszero.Whentheiterationsarecomplete,theestimatedasymptoticcovariancematrixissimplyEst.Asy.Var[γˆ]=[WˆW]−1∗∗basedontheestimatedparameters.TheusefulnessoftheresultjustgivenisthatE[∂2lnL/∂γ∂β]is,infact,zero.SincetheexpectedHessianisblockdiagonal,applyingthemethodofscoringtothefullparametervectorcanproceedintwoparts,exactlyasitdidinSection11.7.1forthemultiplicativeheteroscedasticitymodel.Thatis,theupdatesforthemeanandvarianceparametervectorscanbecomputedseparately.Considerthentheslopeparameters,β.ThesametypeofmodifiedscoringmethodasusedearlierproducestheiterationT−1Txx1ddxε1dβˆi+1=βˆi+tt+tttt+tvσ22σ2σ2σ22σ2tt=1tttt=1ttT−1xx1dd∂lnLitttt=βˆ++(11-31)σt22σt2σt2∂βt=1=βˆi+hi,whichhasbeenreferredtoasadouble-lengthregression.[SeeOrme(1990)andDavidsonandMacKinnon(1993,Chapter14).]Theupdatevectorhiisthevectorofslopesinanaugmentedordouble-lengthgeneralizedregression,hi=[C−1C]−1[C−1a],(11-32)whereCisa2T×Kmatrixwhosefi√rstTrowsaretheXfromtheoriginalregressionmodelandwhosenextTrowsare(1/2)d/σ2,t=1,...,T;aisa2T×1vectorwhosett√firstTelementsareεandwhosenextTelementsare(1/2)v/σ2,t=1,...,T;andtttisadiagonalmatrixwith1/σ2inpositions1,...,TandonesbelowobservationT.tAtconvergence,[C−1C]−1providestheasymptoticcovariancematrixfortheMLE.Theresemblancetothefamiliarresultforthegeneralizedregressionmodelisstriking,butnotethatthisresultisbasedonthedouble-lengthregression.\nGreene-50240bookJune17,200216:21244CHAPTER11✦HeteroscedasticityTheiterationisdonesimplybycomputingtheupdatevectorstothecurrentpa-rametersasdefinedabove.29Animportantconsiderationisthattoapplythescoringmethod,theestimatesofβandγareupdatedsimultaneously.Thatis,onedoesnotusetheupdatedestimateofγin(11-30)toupdatetheweightsfortheGLSregressiontocomputethenewβin(11-31).Thesameestimates(theresultsoftheprioriteration)areusedontheright-handsidesofboth(11-30)and(11-31).Theremainingproblemistoobtainstartingvaluesfortheiterations.Oneobviouschoiceisb,theOLSestima-tor,forβ,ee/T=s2forα,andzeroforalltheremainingparameters.TheOLSslope0vectorwillbeconsistentunderallspecifications.Ausefulalternativeinthiscontextwouldbetostartαatthevectorofslopesintheleastsquaresregressionofe2,thetsquaredOLSresidual,onaconstantandqlaggedvalues.30Asdiscussedbelow,anLMtestforthepresenceofGARCHeffectsisthenaby-productofthefirstiteration.Inprinciple,theupdatedresultofthefirstiterationisanefficienttwo-stepestimatorofalltheparameters.Buthavinggonetothefullefforttosetuptheiterations,nothingisgainedbynotiteratingtoconvergence.Onevirtueofallowingtheproceduretoiteratetoconvergenceisthattheresultinglog-likelihoodfunctioncanbeusedinlikelihoodratiotests.11.8.4TESTINGFORGARCHEFFECTSTheprecedingdevelopmentappearsfairlycomplicated.Infact,itisnot,sinceateachstep,nothingmorethanalinearleastsquaresregressionisrequired.Theintricatepartofthecomputationissettingupthederivatives.Ontheotherhand,itdoestakeafairamountofprogrammingtogetthisfar.31AsBollerslevsuggests,itmightbeusefultotestforGARCHeffectsfirst.Thesimplestapproachistoexaminethesquaresoftheleastsquaresresiduals.Theautocorrelations(correlationswithlaggedvalues)ofthesquaresoftheresidualsprovideevidenceaboutARCHeffects.AnLMtestofARCH(q)againstthehypothesisofnoARCHeffects[ARCH(0),theclassicalmodel]canbecarriedoutbycomputingχ2=TR2intheregressionofe2onaconstantandqlaggedvalues.UnderthenullthypothesisofnoARCHeffects,thestatistichasalimitingchi-squareddistributionwithqdegreesoffreedom.ValueslargerthanthecriticaltablevaluegiveevidenceofthepresenceofARCH(orGARCH)effects.BollerslevsuggestsaLagrangemultiplierstatisticthatis,infact,surprisinglysimpletocompute.TheLMtestforGARCH(p,0)againstGARCH(p,q)canbecarriedoutbyreferringTtimestheR2inthelinearregressiondefinedin(11-30)tothechi-squaredcriticalvaluewithqdegreesoffreedom.Thereis,unfortunately,anindeterminacyinthistestprocedure.ThetestforARCH(q)againstGARCH(p,q)isexactlythesameasthatforARCH(p)againstARCH(p+q).Forcarryingoutthetest,onecanuseas29SeeFiorentinietal.(1996)oncomputationofderivativesinGARCHmodels.30AtestforthepresenceofqARCHeffectsagainstnonecanbecarriedoutbycarryingTR2fromthisregressionintoatableofcriticalvaluesforthechi-squareddistribution.ButinthepresenceofGARCHeffects,thisprocedurelosesitsvalidity.31Sincethisprocedureisavailableasapreprogrammedprocedureinmanycomputerprograms,includingTSP,E-Views,Stata,RATS,LIMDEP,andShazam,thiswarningmightitselfbeoverstated.\nGreene-50240bookJune17,200216:21CHAPTER11✦Heteroscedasticity245TABLE11.5MaximumLikelihoodEstimatesofaGARCH(1,1)Model32µα0α1δα0/(1−α1−δ)Estimate−0.0061900.010760.15310.80600.2631Std.Error0.008730.003120.02730.03020.594tratio−0.7093.4455.60526.7310.443lnL=−1106.61,lnL=−1311.09,y¯=−0.01642,s2=0.221128OLSstartingvaluesasetofestimatesthatincludesδ=0andanyconsistentestimatorsforβandα.ThenTR2fortheregressionattheinitialiterationprovidestheteststatistic.33Anumberofrecentpapershavequestionedtheuseofteststatisticsbasedsolelyonnormality.Wooldridge(1991)isausefulsummarywithseveralexamples.Example11.8GARCHModelforExchangeRateVolatilityBollerslevandGhyselsanalyzedtheexchangeratedatainExample11.7usingaGARCH(1,1)model,yt=µ+εt,E[εt|εt−1]=0,222Var[εt|εt−1]=σt=α0+α1εt−1+δσt−1.Theleastsquaresresidualsforthismodelaresimplyet=yt−y¯.Regressionofthesquaresoftheseresidualsonaconstantand10laggedsquaredvaluesusingobservations11-1974producesanR2=0.025255.WithT=1964,thechi-squaredstatisticis49.60,whichislargerthanthecriticalvaluefromthetableof18.31.WeconcludethatthereisevidenceofGARCHeffectsintheseresiduals.ThemaximumlikelihoodestimatesoftheGARCHmodelaregiveninTable11.5.NotetheresemblancebetweentheOLSunconditionalvariance(0.221128)andtheestimatedequilibriumvariancefromtheGARCHmodel,0.2631.11.8.5PSEUDO-MAXIMUMLIKELIHOODESTIMATIONWenowconsideranimplicationofnonnormalityofthedisturbances.Supposethattheassumptionofnormalityisweakenedtoonlyε2ε4E[ε|]=0,Et=1,Et=κ<∞,ttσ2tσ4tttwhereσ2isasdefinedearlier.Nowthenormallog-likelihoodfunctionisinappropriate.tInthiscase,thenonlinear(ordinaryorweighted)leastsquaresestimatorwouldhavethepropertiesdiscussedinChapter9.ItwouldbemoredifficulttocomputethantheMLEdiscussedearlier,however.Ithasbeenshown[seeWhite(1982a)andWeiss(1982)]thatthepseudo-MLEobtainedbymaximizingthesamelog-likelihoodasifitwere32ThesedatahavebecomeastandarddatasetfortheevaluationofsoftwareforestimatingGARCHmodels.Thevaluesgivenarethebenchmarkestimates.Standarderrorsdiffersubstantiallyfromonemethodtothenext.ThosegivenaretheBollerslevandWooldridge(1992)results.SeeMcCulloughandRenfro(1999).33BollerslevarguesthatinviewofthecomplexityofthecomputationsinvolvedinestimatingtheGARCHmodel,itisusefultohaveatestforGARCHeffects.Thiscaseisone(asaremanyothermaximumlikelihoodproblems)inwhichtheapparatusforcarryingoutthetestisthesameasthatforestimatingthemodel,however.HavingcomputedtheLMstatisticforGARCHeffects,onecanproceedtoestimatethemodeljustbyallowingtheprogramtoiteratetoconvergence.Thereisnoadditionalcostbeyondwaitingfortheanswer.\nGreene-50240bookJune17,200216:21246CHAPTER11✦Heteroscedasticitycorrectproducesaconsistentestimatordespitethemisspecification.34Theasymptoticcovariancematricesfortheparameterestimatorsmustbeadjusted,however.Thegeneralresultforcasessuchasthisone[seeGourieroux,Monfort,andTrognon(1984)]isthattheappropriateasymptoticcovariancematrixforthepseudo-MLEofaparametervectorθwouldbeAsy.Var[θˆ]=H−1FH−1,(11-33)where∂2lnLH=−E∂θ∂θand∂lnL∂lnLF=E∂θ∂θ(thatis,theBHHHestimator),andlnListheusedbutinappropriatelog-likelihoodfunction.Forcurrentpurposes,HandFarestillblockdiagonal,sowecantreatthemeanandvarianceparametersseparately.Inaddition,E[vt]isstillzero,sothesecondderivativetermsinbothblocksarequitesimple.(Thepartsinvolving∂2σ2/∂γ∂γandt∂2σ2/∂β∂βfalloutoftheexpectation.)Takingexpectationsandinsertingthepartstproducesthecorrectedasymptoticcovariancematrixforthevarianceparameters:Asy.Var[γˆ]=[WW]−1BB[WW]−1,PMLE∗∗∗∗wheretherowsofW∗aredefinedin(18-30)andthoseofBarein(11-28).Fortheslopeparameters,theadjustedasymptoticcovariancematrixwouldbeTAsy.Var[βˆ]=[C−1C]−1bb[C−1C]−1,PMLEttt=1wheretheoutermatrixisdefinedin(11-31)and,fromthefirstderivativesgivenin(11-29)and(11-31),xtεt1vt35bt=+dt.σ22σ2tt11.9SUMMARYANDCONCLUSIONSThischapterhasanalyzedoneformofthegeneralizedregressionmodel,themodelofheteroscedasticity.Wefirstconsideredleastsquaresestimation.Theprimaryresultfor34White(1982a)givessomeadditionalrequirementsforthetrueunderlyingdensityofεt.Gourieroux,Monfort,andTrognon(1984)alsoconsidertheissue.Undertheassumptionsgiven,theexpectationsofthematricesin(18-27)and(18-32)remainthesameasundernormality.Theconsistencyandasymptoticnormalityofthepseudo-MLEcanbearguedunderthelogicofGMMestimators.35McCulloughandRenfro(1999)examinedseveralapproachestocomputinganappropriateasymptoticcovariancematrixfortheGARCHmodel,includingtheconventionalHessianandBHHHestimatorsandthreesandwichstyleestimatorsincludingtheonesuggestedabove,andtwobasedonthemethodofscoringsuggestedbyBollerslevandWooldridge(1992).Nonestandoutasobviouslybetter,buttheBollerslevandQMLEestimatorbasedonanactualHessianappearstoperformwellinMonteCarlostudies.\nGreene-50240bookJune17,200216:21CHAPTER11✦Heteroscedasticity247leastsquaresestimationisthatitretainsitsconsistencyandasymptoticnormality,butsomecorrectiontotheestimatedasymptoticcovariancematrixmaybeneededforap-propriateinference.TheWhiteestimatoristhestandardapproachforthiscomputation.ThesetworesultsalsoconstitutetheGMMestimatorforthismodel.Afterexaminingsomegeneraltestsforheteroscedasticity,wethennarrowedthemodeltosomespecificparametricforms,andconsideredweighted(generalized)leastsquaresandmaximumlikelihoodestimation.Iftheformoftheheteroscedasticityisknownbutinvolvesun-knownparameters,thenitremainsuncertainwhetherFGLScorrectionsarebetterthanOLS.Asymptotically,thecomparisonisclear,butinsmallormoderatelysizedsamples,theadditionalvariationincorporatedbytheestimatedvarianceparametersmayoff-setthegainstoGLS.Thefinalsectionofthischapterexaminedamodelofstochasticvolatility,theGARCHmodel.Thismodelhasprovedespeciallyusefulforanalyzingfinancialdatasuchasexchangerates,inflation,andmarketreturns.KeyTermsandConcepts•ARCHmodel•Lagrangemultipliertest•Robustnesstounknown•ARCH-in-mean•Heteroscedasticityheteroscedasticity•Breusch–Pagantest•Likelihoodratiotest•Stationaritycondition•Double-lengthregression•Maximumlikelihood•Stochasticvolatility•Efficienttwo-stepestimatorestimators•Two-stepestimator•GARCHmodel•Modelbasedtest•Waldtest•Generalizedleastsquares•Movingaverage•Weightedleastsquares•Generalizedsumofsquares•Multiplicative•Whiteestimator•GMMestimatorheteroscedasticity•White’stest•Goldfeld–Quandttest•Nonconstructivetest•Groupwise•Residualbasedtestheteroscedasticity•RobustestimatorExercises1.Supposethattheregressionmodelisyi=µ+εi,whereE[εi|xi]=0,Cov[ε,ε|x,x]=0fori=j,butVar[ε|x]=σ2x2,x>0.ijijiiiia.Givenasampleofobservationsonyiandxi,whatisthemostefficientestimatorofµ?Whatisitsvariance?b.WhatistheOLSestimatorofµ,andwhatisthevarianceoftheordinaryleastsquaresestimator?c.Provethattheestimatorinpartaisatleastasefficientastheestimatorinpartb.2.Forthemodelinthepreviousexercise,whatistheprobabilitylimitofs2=1n22ni=1(yi−y¯)?Notethatsistheleastsquaresestimatoroftheresidualvariance.ItisalsontimestheconventionalestimatorofthevarianceoftheOLSestimator,s2Est.Var[y¯]=s2(XX)−1=.nHowdoesthisequationcomparewiththetruevalueyoufoundinpartbofExer-cise1?Doestheconventionalestimatorproducethecorrectestimateofthetrueasymptoticvarianceoftheleastsquaresestimator?\nGreene-50240bookJune17,200216:21248CHAPTER11✦Heteroscedasticity3.Twosamplesof50observationseachproducethefollowingmomentmatrices.(Ineachcase,Xisaconstantandonevariable.)Sample1Sample2
5030050300XX30021003002100yX[3002000][3002200]yy21002800a.Computetheleastsquaresregressioncoefficientsandtheresidualvariancess2foreachdataset.ComputetheR2foreachregression.b.ComputetheOLSestimateofthecoefficientvectorassumingthatthecoefficientsanddisturbancevariancearethesameinthetworegressions.Alsocomputetheestimateoftheasymptoticcovariancematrixoftheestimate.c.Testthehypothesisthatthevariancesinthetworegressionsarethesamewithoutassumingthatthecoefficientsarethesameinthetworegressions.d.Computethetwo-stepFGLSestimatorofthecoefficientsintheregressions,assumingthattheconstantandslopearethesameinbothregressions.Computetheestimateofthecovariancematrixandcompareitwiththeresultofpartb.4.UsingthedatainExercise3,usetheOberhofer–Kmentamethodtocomputethemaximumlikelihoodestimateofthecommoncoefficientvector.5.Thisexerciseisbasedonthefollowingdataset.50ObservationsonY:−1.422.752.10−5.081.491.000.16−1.111.66−0.26−4.875.942.21−6.870.901.612.11−3.82−0.627.0126.147.390.791.931.97−23.17−2.52−1.26−0.153.41−5.451.311.522.043.006.315.51−15.22−1.47−1.486.661.782.62−5.16−4.71−0.35−0.481.240.691.9150ObservationsonX1:−1.651.480.770.670.680.23−0.40−1.130.15−0.630.340.350.790.77−1.040.280.58−0.41−1.781.250.221.25−0.120.661.06−0.66−1.18−0.80−1.320.161.06−0.600.790.862.04−0.510.020.33−1.990.70−0.170.330.481.90−0.18−0.18−1.620.390.171.0250ObservationsonX2:−0.670.700.322.88−0.19−1.28−2.72−0.70−1.55−0.74−1.871.560.37−2.071.200.26−1.34−2.100.612.324.382.161.510.30−0.177.82−1.151.772.92−1.942.091.50−0.460.19−0.391.541.87−3.45−0.88−1.531.42−2.701.77−1.89−1.852.011.26−2.021.91−2.23a.ComputetheordinaryleastsquaresregressionofYonaconstant,X1,andX2.BesuretocomputetheconventionalestimatoroftheasymptoticcovariancematrixoftheOLSestimatoraswell.\nGreene-50240bookJune17,200216:21CHAPTER11✦Heteroscedasticity249b.ComputetheWhiteestimatoroftheappropriateasymptoticcovariancematrixfortheOLSestimates.c.TestforthepresenceofheteroscedasticityusingWhite’sgeneraltest.Doyourresultssuggestthenatureoftheheteroscedasticity?d.UsetheBreusch–PaganLagrangemultipliertesttotestforheteroscedasticity.e.SortthedatakeyingonX1andusetheGoldfeld–Quandttesttotestforhet-eroscedasticity.Repeattheprocedure,usingX2.Whatdoyoufind?6.UsingthedataofExercise5,reestimatetheparametersusingatwo-stepFGLSestimator.TrytheestimatorusedinExample11.4.7.ForthemodelinExercise1,supposethatεisnormallydistributed,withmeanzeroandvarianceσ2[1+(γx)2].Showthatσ2andγ2canbeconsistentlyestimatedbyaregressionoftheleastsquaresresidualsonaconstantandx2.Isthisestimatorefficient?8.Derivethelog-likelihoodfunction,first-orderconditionsformaximization,andinformationmatrixforthemodely=xβ+ε,ε∼N[0,σ2(γz)2].iiiii−y/(βx)9.Supposethatyhasthepdff(y|x)=(1/βx)e,y>0.ThenE[y|x]=βxandVar[y|x]=(βx)2.Forthismodel,provethatGLSandMLEarethesame,eventhoughthisdistributioninvolvesthesameparametersintheconditionalmeanfunctionandthedisturbancevariance.10.InthediscussionofHarvey’smodelinSection11.7,itisnotedthattheinitialestimatorofγ,theconstanttermintheregressionoflne2onaconstant,and1iziisinconsistentbytheamount1.2704.Harveypointsoutthatifthepurposeofthisinitialregressionisonlytoobtainstartingvaluesfortheiterations,thenthecorrectionisnotnecessary.Explainwhythisstatementwouldbetrue.11.(Thisexerciserequiresappropriatecomputersoftware.ThecomputationsrequiredcanbedonewithRATS,EViews,Stata,TSP,LIMDEP,andavarietyofothersoftwareusingonlypreprogrammedprocedures.)Quarterlydataontheconsumerpriceindexfor1950.1to2000.4aregiveninAppendixTableF5.1.UsethesedatatofitthemodelproposedbyEngleandKraft(1983).Themodelisπt=β0+β1πt−1+β2πt−2+β3πt−3+β4πt−4+εtwhereπt=100ln[pt/pt−1]andptisthepriceindex.a.Fitthemodelbyordinaryleastsquares,thenusethetestssuggestedinthetexttoseeifARCHeffectsappeartobepresent.b.TheauthorsfitanARCH(8)modelwithdecliningweights,89−iσ2=α+ε2t0t−i36i=1Fitthismodel.Ifthesoftwaredoesnotallowconstraintsonthecoefficients,youcanstilldothiswithatwo-stepleastsquaresprocedure,usingtheleastsquaresresidualsfromthefirststep.Whatdoyoufind?c.Bollerslev(1986)recomputedthismodelasaGARCH(1,1).UsetheGARCH(1,1)formandrefityourmodel.\nGreene-50240bookJune17,200214:112SERIALCORRELATIONQ12.1INTRODUCTIONTime-seriesdataoftendisplayautocorrelation,orserialcorrelationofthedisturbancesacrossperiods.Consider,forexample,theplotoftheleastsquaresresidualsinthefollowingexample.Example12.1MoneyDemandEquationTableF5.1containsquarterlydatafrom1950.1to2000.4ontheU.S.moneystock(M1)andoutput(realGDP)andthepricelevel(CPIU).Considerasimple(extremely)modelofmoneydemand,1lnM1t=β1+β2lnGDPt+β3lnCPIt+εtAplotoftheleastsquaresresidualsisshowninFigure12.1.Thepatternintheresidualssuggeststhatknowledgeofthesignofaresidualinoneperiodisagoodindicatorofthesignoftheresidualinthenextperiod.Thisknowledgesuggeststhattheeffectofagivendisturbanceiscarried,atleastinpart,acrossperiods.Thissortof“memory”inthedisturbancescreatesthelong,slowswingsfrompositivevaluestonegativeonesthatisevidentinFigure12.1.Onemightarguethatthispatternistheresultofanobviouslynaivemodel,butthatisoneoftheimportantpointsinthisdiscussion.Patternssuchasthisusuallydonotarisespontaneously;toalargeextent,theyare,indeed,aresultofanincompleteorflawedmodelspecification.Oneexplanationforautocorrelationisthatrelevantfactorsomittedfromthetime-seriesregression,likethoseincluded,arecorrelatedacrossperiods.Thisfactmaybeduetoserialcorrelationinfactorsthatshouldbeintheregressionmodel.Itiseasytoseewhythissituationwouldarise.Example12.2showsanobviouscase.Example12.2AutocorrelationInducedbyMisspecificationoftheModelInExamples2.3and7.6,weexaminedyearlytime-seriesdataontheU.S.gasolinemarketfrom1960to1995.TheevidenceintheexampleswasconvincingthataregressionmodelofvariationinlnG/popshouldinclude,ataminimum,aconstant,lnPGandlnincome/pop.Otherpricevariablesandatimetrendalsoprovidesignificantexplanatorypower,butthesetwoareabareminimum.Moreover,wealsofoundonthebasisofaChowtestofstructuralchangethatapparentlythismarketchangedstructurallyafter1974.Figure12.2displaysplotsoffoursetsofleastsquaresresiduals.Parts(a)through(c)showclearlythatasthespecificationoftheregressionisexpanded,theautocorrelationinthe“residuals”diminishes.Part(c)showstheeffectofforcingthecoefficientsintheequationtobethesamebothbeforeandafterthestructuralshift.Inpart(d),theresidualsinthetwosubperiods1960to1974and1975to1995areproducedbyseparateunrestrictedregressions.Thislattersetofresidualsisalmostnonautocorrelated.(Notealsothattherangeofvariationoftheresidualsfallsas1Sincethischapterdealsexclusivelywithtime-seriesdata,weshallusetheindextforobservationsandTforthesamplesizethroughout.250\nGreene-50240bookJune17,200214:1CHAPTER12✦SerialCorrelation251LeastSquaresResiduals.225.150.075.000Residual.075.150.225.30019501963197619892002QuarterFIGURE12.1AutocorrelatedResiduals.themodelisimproved,i.e.,asitsfitimproves.)ThefullequationisGtItln=β1+β2lnPGt+β3ln+β4lnPNCt+β5lnPUCtpoptpopt+β6lnPPTt+β7lnPNt+β8lnPDt+β9lnPSt+β10t+εt.Finally,weconsideranexampleinwhichserialcorrelationisananticipatedpartofthemodel.Example12.3NegativeAutocorrelationinthePhillipsCurveThePhillipscurve[Phillips(1957)]hasbeenoneofthemostintensivelystudiedrelationshipsinthemacroeconomicsliterature.Asoriginallyproposed,themodelspecifiesanegativere-lationshipbetweenwageinflationandunemploymentintheUnitedKingdomoveraperiodof100years.Recentresearchhasdocumentedasimilarrelationshipbetweenunemploymentandpriceinflation.Itisdifficulttojustifythemodelwhencastinsimplelevels;labormarkettheoriesoftherelationshiprelyonanuncomfortablepropositionthatmarketspersistentlyfallvictimtomoneyillusion,evenwhentheinflationcanbeanticipated.Currentresearch[e.g.,Staigeretal.(1996)]hasreformulatedashortrun(disequilibrium)“expectationsaug-mentedPhillipscurve”intermsofunexpectedinflationandunemploymentthatdeviatesfromalongrunequilibriumor“naturalrate.”Theexpectations-augmentedPhillipscurvecanbewrittenas∗pt−E[pt|t−1]=β[ut−u]+εtwhereptistherateofinflationinyeart,E[pt|t−1]istheforecastofptmadeinperiodt−1basedoninformationavailableattimet−1,,uistheunemploymentrateandu∗t−1tisthenatural,orequilibriumrate.(Whetheru∗canbetreatedasanunchangingparameter,asweareabouttodo,iscontroversial.)Byconstruction,[u−u∗]isdisequilibrium,orcycli-tcalunemployment.Inthisformulation,εtwouldbethesupplyshock(i.e.,thestimulusthatproducesthedisequilibriumsituation.)Tocompletethemodel,werequireamodelfortheexpectedinflation.WewillrevisitthisinsomedetailinExample19.2.Forthepresent,we’ll\nGreene-50240bookJune17,200214:1252CHAPTER12✦SerialCorrelationResiduals.Barsmarkmeanres.and/2s(e)Residuals.Barsmarkmeanres.and/2s(e).225.10.150.05.075.000.00ResidualResidual.075.05.150.225.10195919641969197419791984198919941999195919641969197419791984198919941999YearYear(a)RegressiononlogPG(b)RegressiononlogPGLogIPopResiduals.Barsmarkmeanres.and/2s(e)Residuals.Barsmarkmeanres.and/2s(e).04.025.03.020.02.015.010.01.005.00.000Residual.01Residual.005.02.010.03.015.04.020195919641969197419791984198919941999195919641969197419791984198919941999YearYear(c)FullRegression(d)FullRegression,SeparateCoefficientsFIGURE12.2ResidualPlotsforMisspecifiedModels.assumethateconomicagentsarerankempiricists.Theforecastofnextyear’sinflationissimplythisyear’svalue.Thisproducestheestimatingequationpt−pt−1=β1+β2ut+εtwhereβ=βandβ=−βu∗.Notethatthereisanimpliedestimateofthenaturalrateof21unemploymentembeddedintheequation.Afterestimation,u∗canbeestimatedby−b/b.12Theequationwasestimatedwiththe1950.1–2000.4datainTableF5.1thatwereusedinExample12.1(minustwoquartersforthechangeintherateofinflation).Leastsquaresestimates(withstandarderrorsinparentheses)areasfollows:pt−pt−1=0.49189−0.090136ut+et2(0.7405)(0.1257)R=0.002561,T=201.Theimpliedestimateofthenaturalrateofunemploymentis5.46percent,whichisinlinewithotherrecentestimates.Theestimatedasymptoticcovarianceofb1andb2is−0.08973.Usingthedeltamethod,weobtainastandarderrorof2.2062forthisestimate,soaconfidencein-tervalforthenaturalrateis5.46percent±1.96(2.21percent)=(1.13percent,9.79percent)(whichseemsfairlywide,but,again,whetheritisreasonabletotreatthisasaparameterisatleastquestionable).Theregressionoftheleastsquaresresidualsontheirpastvaluesgivesaslopeof−0.4263withahighlysignificanttratioof−6.725.Wethusconcludethatthe\nGreene-50240bookJune17,200214:1CHAPTER12✦SerialCorrelation253PhillipsCurveDeviationsfromExpectedInflation1050Residual5101519501963197619892002QuarterFIGURE12.3NegativelyAutocorrelatedResiduals.residuals(and,apparently,thedisturbances)inthismodelarehighlynegativelyautocorre-lated.ThisisconsistentwiththestrikingpatterninFigure12.3.Theproblemsforestimationandinferencecausedbyautocorrelationaresimilarto(although,unfortunately,moreinvolvedthan)thosecausedbyheteroscedasticity.Asbefore,leastsquaresisinefficient,andinferencebasedontheleastsquaresestimatesisadverselyaffected.Dependingontheunderlyingprocess,however,GLSandFGLSestimatorscanbedevisedthatcircumventtheseproblems.Thereisonequalitativedifferencetobenoted.InChapter11,weexaminedmodelsinwhichthegeneralizedregressionmodelcanbeviewedasanextensionoftheregressionmodeltothecon-ditionalsecondmomentofthedependentvariable.Inthecaseofautocorrelation,thephenomenonarisesinalmostallcasesfromamisspecificationofthemodel.Viewsdifferonhowoneshouldreacttothisfailureoftheclassicalassumptions,fromapragmaticonethattreatsitasanother“problem”inthedatatoanorthodoxmethodologicalviewthatitrepresentsamajorspecificationissue—see,forexample,“ASimpleMessagetoAutocorrelationCorrectors:Don’t”[Mizon(1995).]Weshouldemphasizethatthemodelsweshallexamineherearequitefarremovedfromtheclassicalregression.Theexactorsmall-samplepropertiesoftheestimatorsarerarelyknown,andonlytheirasymptoticpropertieshavebeenderived.12.2THEANALYSISOFTIME-SERIESDATAThetreatmentinthischapterwillbethefirststructuredanalysisoftimeseriesdatainthetext.(WehadabriefencounterinSection5.3whereweestablishedsomeconditions\nGreene-50240bookJune17,200214:1254CHAPTER12✦SerialCorrelationunderwhichmomentsoftimeseriesdatawouldconverge.)Time-seriesanalysisrequiressomerevisionoftheinterpretationofbothdatagenerationandsamplingthatwehavemaintainedthusfar.Atime-seriesmodelwilltypicallydescribethepathofavariableytintermsofcontemporaneous(andperhapslagged)factorsxt,disturbances(innovations),εt,anditsownpast,yt−1,...Forexample,yt=β1+β2xt+β3yt−1+εt.Thetimeseriesisasingleoccurrenceofarandomevent.Forexample,thequarterlyseriesonrealoutputintheUnitedStatesfrom1950to2000thatweexaminedinEx-ample12.1isasinglerealizationofaprocess,GDPt.Theentirehistoryoverthisperiodconstitutesarealizationoftheprocess.Atleastineconomics,theprocesscouldnotberepeated.Thereisnocounterparttorepeatedsamplinginacrosssectionorreplicationofanexperimentinvolvingatimeseriesprocessinphysicsorengineering.Nonetheless,werecircumstancesdifferentattheendofWorldWarII,theobservedhistorycouldhavebeendifferent.Inprinciple,acompletelydifferentrealizationoftheentireseriesmighthaveoccurred.Thesequenceofobservations,{y}t=∞isatime-seriesprocesswhichistt=−∞characterizedbyitstimeorderinganditssystematiccorrelationbetweenobservationsinthesequence.Thesignaturecharacteristicofatimeseriesprocessisthatempirically,thedatageneratingmechanismproducesexactlyonerealizationofthesequence.Sta-tisticalresultsbasedonsamplingcharacteristicsconcernnotrandomsamplingfromapopulation,butfromdistributionsofstatisticsconstructedfromsetsofobservationstakenfromthisrealizationinatimewindow,t=1,...,T.Asymptoticdistributiontheoryinthiscontextconcernsbehaviorofstatisticsconstructedfromanincreasinglylongwindowinthissequence.Thepropertiesofytasarandomvariableinacrosssectionarestraightforwardandareconvenientlysummarizedinastatementaboutitsmeanandvarianceortheprobabilitydistributiongeneratingyt.Thestatementislessobvioushere.Itiscommontoassumethatinnovationsaregeneratedindependentlyfromoneperiodtothenext,withthefamiliarassumptionsE[εt]=0,Var[ε]=σ2,tandCov[εt,εs]=0fort=s.Inthecurrentcontext,thisdistributionofεtissaidtobecovariancestationaryorweaklystationary.Thus,althoughthesubstantivenotionof“randomsampling”mustbeextendedforthetimeseriesεt,themathematicalresultsbasedonthatnotionapplyhere.Itcanbesaid,forexample,thatεtisgeneratedbyatime-seriesprocesswhosemeanandvariancearenotchangingovertime.Assuch,bythemethodwewilldiscussinthischapter,wecould,atleastinprinciple,obtainsampleinformationanduseittocharacterizethedistributionofεt.Couldthesamebesaidofyt?Thereisanobviousdifferencebetweentheseriesεtandyt;observationsonytatdifferentpointsintimearenecessarilycorrelated.Supposethattheytseriesisweaklystationaryandthat,for\nGreene-50240bookJune17,200214:1CHAPTER12✦SerialCorrelation255themoment,β2=0.ThenwecouldsaythatE[yt]=β1+β3E[yt−1]+E[εt]=β1/(1−β3)andVar[y]=β2Var[y]+Var[ε],t3t−1torγ=β2γ+σ2030εsothatσ2γ0=.1−β23Thus,γ0,thevarianceofyt,isafixedcharacteristicoftheprocessgeneratingyt.Notehowthestationarityassumption,whichapparentlyincludes|β3|<1,hasbeenused.Theassumptionthat|β|<1isneededtoensureafiniteandpositivevariance.2Finally,the3sameresultscanbeobtainedfornonzeroβ2ifitisfurtherassumedthatxtisaweaklystationaryseries.3Alternatively,considersimplyrepeatedsubstitutionoflaggedvaluesintotheexpressionforyt:yt=β1+β3(β1+β3yt−2+εt−1)+εt(12-1)andsoon.Weseethat,infact,thecurrentytisanaccumulationoftheentirehistoryoftheinnovations,εt.Soifwewishtocharacterizethedistributionofyt,thenwemightdosointermsofsumsofrandomvariables.Bycontinuingtosubstituteforyt−2,thenyt−3,...in(12-1),weobtainanexplicitrepresentationofthisidea,∞y=βi(β+ε).t31t−ii=0Dosumsthatreachbackintoinfinitepastmakeanysense?Wemightviewtheprocessashavingbegungeneratingdataatsomeremote,effectively“infinite”past.Aslongasdistantobservationsbecomeprogressivelylessimportant,theextensiontoaninfinitepastismerelyamathematicalconvenience.Thediminishingimportanceofpastobservationsisimpliedby|β3|<1.Noticethat,notcoincidentally,thisrequirementisthesameasthatneededtosolveforγ0intheprecedingparagraphs.Asecondpossibilityistoassumethattheobservationofthistimeseriesbeginsatsometime0[with(x0,ε0)calledtheinitialconditions],bywhichtimetheunderlyingprocesshasreachedastatesuchthatthemeanandvarianceofytarenot(orarenolonger)changingovertime.Themathematicsareslightlydifferent,butweareledtothesamecharacterizationoftherandomprocessgeneratingyt.Infact,thesameweakstationarityassumptionensuresbothofthem.Exceptinveryspecialcases,wewouldexpectalltheelementsintheTcomponentrandomvector(y1,...,yT)tobecorrelated.Inthisinstance,saidcorrelationiscalled2Thecurrentliteratureinmacroeconometricsandtimeseriesanalysisisdominatedbyanalysisofcasesinwhichβ3=1(orcounterpartsindifferentmodels).WewillreturntothissubjectinChapter20.3SeeSection12.4.1onthestationarityassumption.\nGreene-50240bookJune17,200214:1256CHAPTER12✦SerialCorrelation“autocorrelation.”Assuch,theresultspertainingtoestimationwithindependentoruncorrelatedobservationsthatweusedinthepreviouschaptersarenolongerusable.Inpointoffact,wehaveasampleofbutoneobservationonthemultivariateran-domvariable[yt,t=1,...,T].Thereisacounterparttothecross-sectionalnotionofparameterestimation,butonlyunderassumptions(e.g.,weakstationarity)thatestab-lishthatparametersinthefamiliarsenseevenexist.Evenwithstationarity,itwillemergethatforestimationandinference,noneofourearlierfinitesampleresultsareusable.Consistencyandasymptoticnormalityofestimatorsaresomewhatmoredifficulttoestablishintime-seriessettingsbecauseresultsthatrequireindependentobservations,suchasthecentrallimittheorems,arenolongerusable.Nonetheless,counterpartstoourearlierresultshavebeenestablishedformostoftheestimationproblemsweconsiderhereandinChapters19and20.12.3DISTURBANCEPROCESSESTheprecedingsectionhasintroducedabitofthevocabularyandaspectsoftimeseriesspecification.Inordertoobtainthetheoreticalresultsweneedtodrawsomeconclusionsaboutautocorrelationandaddsomedetailstothatdiscussion.12.3.1CHARACTERISTICSOFDISTURBANCEPROCESSESIntheusualtime-seriessetting,thedisturbancesareassumedtobehomoscedasticbutcorrelatedacrossobservations,sothatE[εε|X]=σ2,whereσ2isafull,positivedefinitematrixwithaconstantσ2=Var[ε|X]onthetdiagonal.Aswillbeclearinthefollowingdiscussion,weshallalsoassumethattsisafunctionof|t−s|,butnotoftorsalone,whichisastationarityassumption.(Seetheprecedingsection.)Itimpliesthatthecovariancebetweenobservationstandsisafunctiononlyof|t−s|,thedistanceapartintimeoftheobservations.Wedefinetheautocovariances:Cov[ε,ε|X]=Cov[ε,ε|X]=σ2=γ=γ.tt−st+stt,t−ss−sNotethatσ2=γ.Thecorrelationbetweenεandεistheirautocorrelation,tt0tt−sCov[εt,εt−s|X]γsCorr[εt,εt−s|X]===ρs=ρ−s.Var[εt|X]Var[εt−s|X]γ0WecanthenwriteE[εε|X]==γR,0whereisanautocovariancematrixandRisanautocorrelationmatrix—thetselementisanautocorrelationcoefficientγ|t−s|ρts=.γ0\nGreene-50240bookJune17,200214:1CHAPTER12✦SerialCorrelation257(Notethatthematrix=γRisthesameasσ2.Thenamechangeconformstostan-0dardusageintheliterature.)Wewillusuallyusetheabbreviationρstodenotetheautocorrelationbetweenobservationssperiodsapart.DifferenttypesofprocessesimplydifferentpatternsinR.Forexample,themostfrequentlyanalyzedprocessisafirst-orderautoregressionorAR(1)process,εt=ρεt−1+ut,whereutisastationary,nonautocorrelated(“whitenoise”)processandρisaparameter.Wewillverifylaterthatforthisprocess,ρ=ρs.Higher-orderautoregressiveprocessessoftheformεt=θ1εt−1+θ2εt−2+···+θpεt−p+utimplymoreinvolvedpatterns,including,forsomevaluesoftheparameters,cyclicalbehavioroftheautocorrelations.4Stationaryautoregressionsarestructuredsothattheinfluenceofagivendisturbancefadesasitrecedesintothemoredistantpastbutvanishesonlyasymptotically.Forexample,fortheAR(1),Cov[εt,εt−s]isneverzero,butitdoesbecomenegligibleif|ρ|islessthan1.Moving-averageprocesses,conversely,haveashortmemory.FortheMA(1)process,εt=ut−λut−1,thememoryintheprocessisonlyoneperiod:γ=σ2(1+λ2),γ=−λσ2,butγ=00u1usifs>1.12.3.2AR(1)DISTURBANCESTime-seriesprocessessuchastheoneslistedherecanbecharacterizedbytheirorder,thevaluesoftheirparameters,andthebehavioroftheirautocorrelations.5Weshallconsidervariousformsatdifferentpoints.ThereceivedempiricalliteratureisoverwhelminglydominatedbytheAR(1)model,whichispartlyamatterofconvenience.Processesmoreinvolvedthanthismodelareusuallyextremelydifficulttoanalyze.Thereis,however,amorepracticalreason.Itisveryoptimistictoexpecttoknowpreciselythecorrectformoftheappropriatemodelforthedisturbanceinanygivensituation.Thefirst-orderautoregressionhaswithstoodthetestoftimeandexperimentationasareasonablemodelforunderlyingprocessesthatprobably,intruth,areimpenetrablycomplex.AR(1)worksasafirstpass—higherordermodelsareoftenconstructedasarefinement—asintheexamplebelow.Thefirst-orderautoregressivedisturbance,orAR(1)process,isrepresentedintheautoregressiveformasεt=ρεt−1+ut,(12-2)whereE[ut]=0,Eu2=σ2,tu4ThismodelisconsideredinmoredetailinChapter20.5SeeBoxandJenkins(1984)foranauthoritativestudy.\nGreene-50240bookJune17,200214:1258CHAPTER12✦SerialCorrelationandCov[ut,us]=0ift=s.Byrepeatedsubstitution,wehaveε=u+ρu+ρ2u+···.(12-3)ttt−1t−2Fromtheprecedingmoving-averageform,itisevidentthateachdisturbanceεtembodiestheentirepasthistoryoftheu’s,withthemostrecentobservationsreceivinggreaterweightthanthoseinthedistantpast.Dependingonthesignofρ,theserieswillexhibitclustersofpositiveandthennegativeobservationsor,ifρisnegative,regularoscillationsofsign(asinExample12.3).Sincethesuccessivevaluesofutareuncorrelated,thevarianceofεtisthevarianceoftheright-handsideof(12-3):Var[ε]=σ2+ρ2σ2+ρ4σ2+···.(12-4)tuuuToproceed,arestrictionmustbeplacedonρ,|ρ|<1,(12-5)becauseotherwise,theright-handsideof(12-4)willbecomeinfinite.Thisresultisthestationarityassumptiondiscussedearlier.With(12-5),whichimpliesthatlimρs=0,s→∞E[εt]=0andσ2Var[ε]=u=σ2.(12-6)t1−ρ2εWiththestationarityassumption,thereisaneasierwaytoobtainthevariance:Var[ε]=ρ2Var[ε]+σ2tt−1uasCov[ut,εs]=0ift>s.Withstationarity,Var[εt−1]=Var[εt],whichimplies(12-6).Proceedinginthesamefashion,ρσ2uCov[εt,εt−1]=E[εtεt−1]=E[εt−1(ρεt−1+ut)]=ρVar[εt−1]=2.(12-7)1−ρByrepeatedsubstitutionin(12-2),weseethatforanys,s−1ε=ρsε+ρiutt−st−ii=0(e.g.,ε=ρ3ε+ρ2u+ρu+u).Therefore,sinceεisnotcorrelatedwithanytt−3t−2t−1tsutforwhicht>s(i.e.,anysubsequentut),itfollowsthatρsσ2uCov[εt,εt−s]=E[εtεt−s]=2.(12-8)1−ρDividingbyγ=σ2/(1−ρ2)providestheautocorrelations:0uCorr[ε,ε]=ρ=ρs.(12-9)tt−ssWiththestationarityassumption,theautocorrelationsfadeovertime.Dependingonthesignofρ,theywilleitherbedecliningingeometricprogressionoralternatingin\nGreene-50240bookJune17,200214:1CHAPTER12✦SerialCorrelation259signifρisnegative.Collectingterms,wehave23T−11ρρρ···ρρ1ρρ2···ρT−2σ2σ2=uρ2ρ1ρ···ρT−3.(12-10)1−ρ2............···ρρT−1ρT−2ρT−3···ρ112.4SOMEASYMPTOTICRESULTSFORANALYZINGTIMESERIESDATASinceisnotequaltoI,thenowfamiliarcomplicationswillariseinestablishingthepropertiesofestimatorsofβ,inparticularoftheleastsquaresestimator.Thefinitesam-plepropertiesoftheOLSandGLSestimatorsremainintact.Leastsquareswillcontinuetobeunbiased;theearliergeneralproofallowsforautocorrelateddisturbances.TheAitkentheoremandthedistributionalresultsfornormallydistributeddisturbancescanstillbeestablishedconditionallyonX.(However,eventhesewillbecomplicatedwhenXcontainslaggedvaluesofthedependentvariable.)But,finitesamplepropertiesareofverylimitedusefulnessintimeseriescontexts.Nearlyallthatcanbesaidaboutestimatorsinvolvingtimeseriesdataisbasedontheirasymptoticproperties.Aswesawinouranalysisofheteroscedasticity,whetherleastsquaresisconsistentornot,dependsonthematricesQ=(1/T)XX,TandQ∗=(1/T)XX.TInourearlieranalyses,wewereabletoargueforconvergenceofQTtoapositivedefinitematrixofconstants,Q,byinvokinglawsoflargenumbers.But,thesetheoremsassumethattheobservationsinthesumsareindependent,whichassuggestedinSection12.1,issurelynotthecasehere.Thus,werequireadifferenttoolforthisresult.WecanexpandthematrixQ∗asT1TTQ∗=ρxx,(12-11)TtstsTt=1s=1wherexandxarerowsofXandρistheautocorrelationbetweenεandε.SufficienttststsconditionsforthismatrixtoconvergearethatQTconvergeandthatthecorrelationsbetweendisturbancesdieoffreasonablyrapidlyastheobservationsbecomefurtherapartintime.Forexample,ifthedisturbancesfollowtheAR(1)processdescribedearlier,thenρ=ρ|t−s|andifxissufficientlywellbehaved,Q∗willconvergetoatstTpositivedefinitematrixQ∗asT→∞.\nGreene-50240bookJune17,200214:1260CHAPTER12✦SerialCorrelationAsymptoticnormalityoftheleastsquaresandGLSestimatorswilldependonthebehaviorofsumssuchasT√√1√1Tw¯=Txε=TXε.TttTTt=1Asymptoticnormalityofleastsquaresisdifficulttoestablishforthisgeneralmodel.Thecentrallimittheoremswehavereliedonthusfardonotextendtosumsofdependentobservations.TheresultsofAmemiya(1985),MannandWald(1943),andAnderson(1971)docarryovertomostofthefamiliartypesofautocorrelateddisturbances,in-cludingthosethatinterestushere,soweshallultimatelyconcludethatordinaryleastsquares,GLS,andinstrumentalvariablescontinuetobeconsistentandasymptoticallynormallydistributed,and,inthecaseofOLS,inefficient.Thissectionwillprovideabriefintroductiontosomeoftheunderlyingprincipleswhichareusedtoreachtheseconclusions.12.4.1CONVERGENCEOFMOMENTS—THEERGODICTHEOREMThediscussionthusfarhassuggested(appropriately)thatstationarity(oritsabsence)isanimportantcharacteristicofaprocess.Thepointsatwhichwehaveencounteredthisnotionconcernedrequirementsthatcertainsumsconvergetofinitevalues.Inparticular,fortheAR(1)model,εt=ρεt−1+ut,inorderforthevarianceoftheprocesstobefinite,werequire|ρ|<1,whichisasufficientcondition.However,thisresultisonlyabyproduct.Stationarity(atleast,theweakstationaritywehaveexamined)isonlyacharacteristicofthesequenceofmomentsofadistribution.DEFINITION12.1StrongStationarityAtimeseriesprocess,{z}t=∞isstronglystationary,or“stationary”ifthejointtt=−∞probabilitydistributionofanysetofkobservationsinthesequence,[zt,zt+1,...,zt+k]isthesameregardlessoftheorigin,t,inthetimescale.Forexample,in(12-2),ifweaddu∼N[0,σ2],thentheresultingprocess{ε}t=∞cantutt=−∞easilybeshowntobestronglystationary.DEFINITION12.2WeakStationarityAtimeseriesprocess,{z}t=∞isweaklystationary(orcovariancestationary)iftt=−∞E[zt]isfiniteandisthesameforalltandifthecovariancesbetweenanytwoobservations(labeledtheirautocovariance),Cov[zt,zt−k],isafinitefunctiononlyofmodelparametersandtheirdistanceapartintime,k,butnotoftheabsolutelocationofeitherobservationonthetimescale.Weakstationaryisobviouslyimpliedbystrongstationary,thoughitrequireslesssincethedistributioncan,atleastinprinciple,bechangingonthetimeaxis.Thedistinction\nGreene-50240bookJune17,200214:1CHAPTER12✦SerialCorrelation261israrelynecessaryinappliedwork.Ingeneral,savefornarrowtheoreticalexamples,itwillbedifficulttocomeupwithaprocessthatisweaklybutnotstronglystationary.Thereasonforthedistinctionisthatinmuchofourwork,onlyweakstationaryisrequired,and,asalways,whenpossible,econometricianswilldispensewithunnecessaryassumptions.Aswewilldiscovershortly,stationarityisacrucialcharacteristicatthispointintheanalysis.Ifwearegoingtoproceedtoparameterestimationinthiscontext,wewillalsorequireanothercharacteristicofatimeseries,ergodicity.Therearevariouswaystodelineatethischaracteristic,noneofthemparticularlyintuitive.WeborrowonedefinitionfromDavidsonandMacKinnon(1993,p.132)whichcomesclose:DEFINITION12.3ErgodicityAtimeseriesprocess,{z}t=∞isergodicifforanytwoboundedfunctionsthatmaptt=−∞vectorsintheaandbdimensionalrealvectorspacestorealscalars,f:Ra→R1andg:Rb→R1,lim|E[f(zt,zt+1,...,zt+a)g(zt+k,zt+k+1,...,zt+k+b)|k→∞=|E[f(zt,zt+1,...,zt+a)||E[g(zt+k,zt+k+1,...,zt+k+b)]|.Thedefinitionstatesessentiallythatifeventsareseparatedfarenoughintime,thentheyare“asymptoticallyindependent.”Animplicationisthatinatimeseries,everyobser-vationwillcontainatleastsomeuniqueinformation.Ergodicityisacrucialelementofourtheoryofestimation.Whenatimeserieshasthisproperty(withstationarity),thenwecanconsiderestimationofparametersinameaningfulsense.6Theanalysisreliesheavilyonthefollowingtheorem:THEOREM12.1TheErgodicTheoremIf{z}t=∞isatime-seriesprocesswhichisstationaryandergodicandE[|z|]istt=−∞tTa.s.afiniteconstantandE[zt]=µ,andifz¯T=(1/T)t=1zt,thenz¯T−→µ.Notethattheconvergenceisalmostsurely,notinprobability(whichisimplied)orinmeansquare(whichisalsoimplied).[SeeWhite(2001,p.44)andDavidsonandMacKinnon(1993,p.133).]WhatwehaveinTheErgodicTheoremis,forsumsofdependentobservations,acoun-terparttothelawsoflargenumbersthatwehaveusedatmanypointsintheprecedingchapters.Note,onceagain,theneedforthisextensionisthattothispoint,ourlawsof6Muchoftheanalysisinlaterchapterswillencounternonstationaryseries,whicharethefocusofmostofthecurrentliterature—testsfornonstationaritylargelydominatetherecentstudyintimeseriesanalysis.Ergodicityisamuchmoresubtleanddifficultconcept.Foranyprocesswhichwewillconsider,ergodicitywillhavetobeagiven,atleastatthislevel.AclassicreferenceonthesubjectisDoob(1953).AnotherauthoritativetreatiseisBillingsley(1979).White(2001)providesaconciseanalysisofmanyoftheseconceptsasusedineconometrics,andsomeusefulcommentary.\nGreene-50240bookJune17,200214:1262CHAPTER12✦SerialCorrelationlargenumbershaverequiredsumsofindependentobservations.But,inthiscontext,bydesign,observationsaredistinctlynotindependent.Inorderforthisresulttobeuseful,wewillrequireanextension.THEOREM12.2ErgodicityofFunctionsIf{z}t=∞isatimeseriesprocesswhichisstationaryandergodicandify=tt=−∞tf{zt}isameasurablefunctionintheprobabilityspacethatdefineszt,thenytisalsostationaryandergodic.Let{z}t=∞defineaK×1vectorvaluedstochastictt=−∞process—eachelementofthevectorisanergodicandstationaryseriesandthecharacteristicsofergodicityandstationarityapplytothejointdistributionoftheelementsof{z}t=∞.ThenTheErgodicTheoremappliestofunctionsof{z}t=∞.tt=−∞tt=−∞(SeeWhite(2001,pp.44–45)fordiscussion.)Theorem12.2producestheresultsweneedtocharacterizetheleastsquares(andother)estimators.Inparticular,ourminimalassumptionsaboutthedataareASSUMPTION12.1ErgodicDataSeries:Intheregressionmodel,yt=xtβ+εt,[x,ε]t=∞isajointlystationaryandergodicprocess.ttt=−∞Byanalyzingtermselementbyelementwecanusetheseresultsdirectlytoassertthataveragesofw=xε,Q=xxandQ∗=ε2xxwillconvergetotheirpopulationttttttttttcounterparts,0,QandQ∗.12.4.2CONVERGENCETONORMALITY—ACENTRALLIMITTHEOREMInordertoformadistributiontheoryforleastsquares,GLS,ML,andGMM,wewillneedacounterparttothecentrallimittheorem.Inparticular,weneedtoestablishalargesampledistributiontheoryforquantitiesoftheform√1T√Txtεt=Tw¯.Tt=1Asnotedearlier,wecannotinvokethefamiliarcentrallimittheorems(Lindberg–Levy,Lindberg–Feller,Liapounov)becausetheobservationsinthesumarenotindependent.But,withtheassumptionsalreadymade,wedohaveanalternativeresult.Someneededpreliminariesareasfollows:DEFINITION12.4MartingaleSequenceAvectorsequenceztisamartingalesequenceifE[zt|zt−1,zt−2,...]=zt−1.\nGreene-50240bookJune17,200214:1CHAPTER12✦SerialCorrelation263Animportantexampleofamartingalesequenceistherandomwalk,zt=zt−1+utwhereCov[ut,us]=0forallt=s.ThenE[zt|zt−1,zt−2,...]=E[zt−1|zt−1,zt−2,...]+E[ut|zt−1,zt−2,...]=zt−1+0=zt−1.DEFINITION12.5MartingaleDifferenceSequenceAvectorsequenceztisamartingaledifferencesequenceifE[zt|zt−1,zt−2,...]=0.WithDefinition12.5,wehavethefollowingbroadlyencompassingresult:THEOREM12.3MartingaleDifferenceCentralLimitTheoremIfztisavectorvaluedstationaryandergodicmartingaledifferencesequence,withTE[ztz√t]=,whereisafinitepositivedefinitematrix,andifz¯T=(1/T)t=1zt,dthenTz¯T−→N[0,].(Fordiscussion,seeDavidsonandMacKinnon(1993,Sections.4.7and4.8.)7Theorem12.3isageneralizationoftheLindberg–LevyCentralLimitTheorem.Itisnotyetbroadenoughtocovercasesofautocorrelation,butitdoesgobeyondLindberg–Levy,forexample,inextendingtotheGARCHmodelofSection11.8.[FormsofthetheoremwhichsurpassLindberg–Feller(D.19)andLiapounov(Theo-remD.20)byallowingfordifferentvariancesateachtime,t,appearinRuud(2000,p.479)andWhite(2001,p.133).Thesevariantsextendbeyondourrequirementsinthistreatment.]But,lookingahead,thisresultencompasseswhatwillbeaveryimportantapplication.Supposeintheclassicallinearregressionmodel,{x}t=∞isastationarytt=−∞andergodicmultivariatestochasticprocessand{ε}t=∞isani.i.d.process—thatis,tt=−∞notautocorrelatedandnotheteroscedastic.Then,thisisthemostgeneralcaseoftheclassicalmodelwhichstillmaintainstheassumptionsaboutεtthatwemadeinChap-ter2.Inthiscase,theprocess{w}t=∞={xε}t=∞isamartingaledifferencesequence,tt=−∞ttt=−∞sothatwithsufficientassumptionsonthemomentsofxtwecouldusethisresulttoestablishconsistencyandasymptoticnormalityoftheleastsquaresestimator.[See,e.g.,Hamilton(1994,pp.208–212).]Wenowconsideracentrallimittheoremthatisbroadenoughtoincludethecasethatinterestedusattheoutset,stochasticallydependentobservationsonxtand7Forconvenience,wearebypassingastepinthisdiscussion—establishingmultivariatenormalityrequiresthattheresultfirstbeestablishedforthemarginalnormaldistributionofeachcomponent,thenthateverylinearcombinationofthevariablesalsobenormallydistributed.Ourinterestatthispointismerelytocollecttheusefulendresults.InterestedusersmayfindthedetaileddiscussionsofthemanysubtletiesandnarrowerpointsinWhite(2001)andDavidsonandMacKinnon(1993,Chapter4).\nGreene-50240bookJune17,200214:1264CHAPTER12✦SerialCorrelationautocorrelationinε.8Supposeasbeforethat{z}t=∞isastationaryandergodict√tt=−∞stochasticprocess.WeconsiderTz¯.Thefollowingconditionsareassumed:9T1.Summabilityofautocovariances:Withdependentobservations,√∞∞∞limVar[Tz¯]=Cov[zz]==∗tskT→∞t=0s=0k=−∞Tobegin,wewillneedtoassumethatthismatrixisfinite,aconditioncalledsummability.NotethisistheconditionneededforconvergenceofQ∗in(12-11).IfthesumistobeTfinite,thenthek=0termmustbefinite,whichgivesusanecessaryconditionE[zz]=,afinitematrix.tt02.Asymptoticuncorrelatedness:E[zt|zt−k,zt−k−1,...]convergesinmeansquaretozeroask→∞.Notethatissimilartotheconditionforergodicity.White(2001)demon-stratesthata(nonobvious)implicationofthisassumptionisE[zt]=0.3.Asymptoticnegligibilityofinnovations:Letrtk=E[zt|zt−k,zt−k−1,...]−E[zt|zt−k−1,zt−k−2,...].Anobservationztmaybeviewedastheaccumulatedinformationthathasenteredtheprocesssinceitbeganuptotimet.Thus,itcanbeshownthat∞zt=rtss=0Thevectorrtkcanbeviewedastheinformationinthisaccumulatedsumthatenteredtheprocessattimet−k.Theconditionimposedontheprocessisthat∞E[rr]s=0tstsbefinite.Inwords,condition(3)statesthatinformationeventuallybecomesnegligibleasitfadesfarbackintimefromthecurrentobservation.TheAR(1)model(asusual)helpstoillustratethispoint.Ifzt=ρzt−1+ut,thenrt0=E[zt|zt,zt−1,...]−E[zt|zt−1,zt−2,...]=zt−ρzt−1=utrt1=E[zt|zt−1,zt−2...]−E[zt|zt−2,zt−3...]=E[ρzt−1+ut|zt−1,zt−2...]−E[ρ(ρzt−2+ut−1)+ut|zt−2,zt−3,...]=ρ(zt−1−ρzt−2)=ρut−1.k∞sByasimilarconstruction,rtk=ρut−kfromwhichitfollowsthatzt=s=0ρut−s,whichwesawearlierin(12-3).Youcanverifythatif|ρ|<1,thenegligibilityconditionwillbemet.8Detailedanalysisofthiscaseisquiteintricateandwellbeyondthescopeofthisbook.SomefairlyterseanalysismaybefoundinWhite(2001,pp.122–133)andHayashi(2000).9SeeHayashi(2000,p.405)whoattributestheresultstoGordin(1969).\nGreene-50240bookJune17,200214:1CHAPTER12✦SerialCorrelation265Withallthismachineryinplace,wenowhavethetheoremwewillneed:THEOREM12.4Gordin’sCentralLimitTheorem√d∗Ifconditions(1)–(3)listedabovearemet,thenTz¯T−→N[0,].Wewillbeabletoemploythesetoolswhenweconsidertheleastsquares,IVandGLSestimatorsinthediscussiontofollow.12.5LEASTSQUARESESTIMATIONTheleastsquaresestimatoris−1XXXεb=(XX)−1Xy=β+.TTUnbiasednessfollowsfromtheresultsinChapter4—nomodificationisneeded.WeknowfromChapter10thattheGauss–MarkovTheoremhasbeenlost—assumingitex-ists(thatremainstobeestablished),theGLSestimatorisefficientandOLSisnot.HowmuchinformationislostbyusingleastsquaresinsteadofGLSdependsonthedata.Broadly,leastsquaresfaresbetterindatawhichhavelongperiodsandlittlecyclicalvariation,suchasaggregateoutputseries.Asmightbeexpected,thegreateristheauto-correlationinε,thegreaterwillbethebenefittousinggeneralizedleastsquares(whenthisispossible).Evenifthedisturbancesarenormallydistributed,theusualFandtstatisticsdonothavethosedistributions.So,notmuchremainsofthefinitesampleprop-ertiesweobtainedinChapter4.Theasymptoticpropertiesremaintobeestablished.12.5.1ASYMPTOTICPROPERTIESOFLEASTSQUARESTheasymptoticpropertiesofbarestraightforwardtoestablishgivenourearlierresults.Ifweassumethattheprocessgeneratingxtisstationaryandergodic,thenbyTheo-rems12.1and12.2,(1/T)(XX)convergestoQandwecanapplytheSlutskytheoremtotheinverse.Ifεtisnotseriallycorrelated,thenwt=xtεtisamartingaledifferencesequence,so(1/T)(Xε)convergestozero.Thisestablishesconsistencyforthesimplecase.Ontheotherhand,if[xt,εt]arejointlystationaryandergodic,thenwecaninvoketheErgodicTheorems12.1and12.2forbothmomentmatricesandestablishconsistency.Asymptoticnormalityisabitmoresubtle.Forthecasewithoutserialcorrelationin√εt,wecanemployTheorem12.3forTw¯.Theinvolvedcaseistheonethatinterestedusattheoutsetofthisdiscussion,thatis,wherethereisautocorrelationinεtanddependenceinxt.Theorem12.4isinplaceforthiscase.Onceagain,theconditionsdescribedintheprecedingsectionmustapplyand,moreover,theassumptionsneededwillhavetobeestablishedbothforxtandεt.CommentaryonthesecasesmaybefoundinDavidsonandMacKinnon(1993),Hamilton(1994),White(2001),andHayashi(2000).Formalpresentationextendsbeyondthescopeofthistext,soatthispoint,wewillproceed,andassumethattheconditionsunderlyingTheorem12.4aremet.Theresultssuggested\nGreene-50240bookJune17,200214:1266CHAPTER12✦SerialCorrelationherearequitegeneral,albeitonlysketchedforthegeneralcase.Fortheremainderofourexamination,atleastinthischapter,wewillconfineattentiontofairlysimpleprocessesinwhichthenecessaryconditionsfortheasymptoticdistributiontheorywillbefairlyevident.Thereisanimportantexceptiontotheresultsintheprecedingparagraph.Iftheregressioncontainsanylaggedvaluesofthedependentvariable,thenleastsquareswillnolongerbeunbiasedorconsistent.Totakethesimplestcase,supposethatyt=βyt−1+εt,(12-12)εt=ρεt−1+ut.andassume|β|<1,|ρ|<1.Inthismodel,theregressorandthedisturbancearecorre-lated.Therearevariouswaystoapproachtheanalysis.Oneusefulwayistorearrange(12-12)bysubtractingρyt−1fromyt.Then,yt=(β+ρ)yt−1−βρyt−2+ut(12-13)whichisaclassicalregressionwithstochasticregressors.Sinceutisaninnovationinperiodt,itisuncorrelatedwithbothregressors,andleastsquaresregressionofyton(yt−1,yt−2)estimatesρ1=(β+ρ)andρ2=−βρ.Whatisestimatedbyregressionofytonyt−1alone?Letγk=Cov[yt,yt−k]=Cov[yt,yt+k].Bystationarity,Var[yt]=Var[yt−1],andCov[yt,yt−1]=Cov[yt−1,yt−2],andsoon.Theseand(12-13)implythefollowingrelationships.γ=ργ+ργ+σ201122uγ1=ρ1γ0+ρ2γ1(12-14)γ2=ρ1γ1+ρ2γ0(ThesearetheYuleWalkerequationsforthismodel.SeeSection20.2.3.)Theslopeinthesimpleregressionestimatesγ1/γ0whichcanbefoundinthesolutionstothesethreeequations.(Analternativeapproachistousetheleftoutvariableformula,whichisausefulwaytointerpretthisestimator.)Inthiscase,weseethattheslopeintheshortregressionisanestimatorof(β+ρ)−βρ(γ1/γ0).Ineithercase,solvingthethreeequationsin(12-14)forγ,γandγintermsofρ,ρandσ2produces01212uβ+ρplimb=.(12-15)1+βρThisresultisbetweenβ(whenρ=0)and1(whenbothβandρ=1).Therefore,leastsquaresisinconsistentunlessρequalszero.Themoregeneralcasethatincludesregres-sors,xt,involvesmorecomplicatedalgebra,butgivesessentiallythesameresult.Thisisageneralresult;whentheequationcontainsalaggeddependentvariableinthepres-enceofautocorrelation,OLSandGLSareinconsistent.Theproblemcanbeviewedasoneofanomittedvariable.12.5.2ESTIMATINGTHEVARIANCEOFTHELEASTSQUARESESTIMATORAsusual,s2(XX)−1isaninappropriateestimatorofσ2(XX)−1(XX)(XX)−1,bothbecauses2isabiasedestimatorofσ2andbecausethematrixisincorrect.Generalities\nGreene-50240bookJune17,200214:1CHAPTER12✦SerialCorrelation267TABLE12.1RobustCovarianceEstimationVariableOLSEstimateOLSSECorrectedSEConstant0.77460.03350.0733lnOutput0.29550.01900.0394lnCPI0.56130.03390.0708R2=0.99655,d=0.15388,r=0.92331.arescarce,butingeneral,foreconomictimeserieswhicharepositivelyrelatedtotheirpastvalues,thestandarderrorsconventionallyestimatedbyleastsquaresarelikelytobetoosmall.Forslowlychanging,trendingaggregatessuchasoutputandconsumption,thisisprobablythenorm.Forhighlyvariabledatasuchasinflation,exchangerates,andmarketreturns,thesituationislessclear.Nonetheless,asageneralproposition,onewouldnormallynotwanttorelyons2(XX)−1asanestimatoroftheasymptoticcovariancematrixoftheleastsquaresestimator.Inviewofthissituation,ifoneisgoingtouseleastsquares,thenitisdesirabletohaveanappropriateestimatorofthecovariancematrixoftheleastsquaresestimator.Therearetwoapproaches.Iftheformoftheautocorrelationisknown,thenonecanestimatetheparametersofdirectlyandcomputeaconsistentestimator.Ofcourse,ifso,thenitwouldbemoresensibletousefeasiblegeneralizedleastsquaresinsteadandnotwastethesampleinformationonaninefficientestimator.ThesecondapproachparallelstheuseoftheWhiteestimatorforheteroscedasticity.Supposethattheformoftheautocorrelationisunknown.Then,adirectestimatorofor(θ)isnotavailable.Theproblemisestimationof1TT=ρxx.(12-16)|t−s|tsTt=1s=1FollowingWhite’ssuggestionforheteroscedasticity,NeweyandWest’s(1987a)robust,consistentestimatorforautocorrelateddisturbanceswithanunspecifiedstructureisLT1jS=S+1−ee[xx+xx],(12-17)∗0tt−jtt−jt−jtTL+1j=1t=j+1[See(10-16)inSection10.3.]ThemaximumlagLmustbedeterminedinadvancetobelargeenoughthatautocorrelationsatlagslongerthanLaresmallenoughtoignore.Foramoving-averageprocess,thisvaluecanbeexpectedtobearelativelysmallnumber.Forautoregressiveprocessesormixtures,however,theautocorrelationsareneverzero,andtheresearchermustmakeajudgmentastohowfarbackitisnecessarytogo.10Example12.4AutocorrelationConsistentCovarianceEstimationForthemodelshowninExample12.1,theregressionresultswiththeuncorrectedstandarderrorsandtheNewey-Westautocorrelationrobustcovariancematrixforlagsof5quartersareshowninTable12.1.Theeffectoftheveryhighdegreeofautocorrelationisevident.10DavidsonandMacKinnon(1993)givefurtherdiscussion.CurrentpracticeistousethesmallestintegergreaterthanorequaltoT1/4.\nGreene-50240bookJune17,200214:1268CHAPTER12✦SerialCorrelation12.6GMMESTIMATIONTheGMMestimatorintheregressionmodelwithautocorrelateddisturbancesispro-ducedbytheempiricalmomentequations1T1xy−xβˆ=Xεˆβˆ=m¯βˆ=0.(12-18)tttGMMGMMGMMTTt=1Theestimatorisobtainedbyminimizingq=m¯βˆWm¯βˆGMMGMMwhereWisapositivedefiniteweightingmatrix.Theoptimalweightingmatrixwouldbe√−1W=Asy.Var[Tm¯(β)]whichistheinverseof√1n1TTAsy.Var[Tm¯(β)]=Asy.Var√xε=plimσ2ρxx=σ2Q∗.iitstsTn→∞Ti=1t=1s=1Theoptimalweightingmatrixwouldbe[σ2Q∗]−1.Asintheheteroscedasticitycase,thisminimizationproblemisanexactlyidentifiedcase,so,theweightingmatrixisirrelevanttothesolution.TheGMMestimatorfortheregressionmodelwithautocorrelateddis-turbancesisordinaryleastsquares.WecanusetheresultsinSection12.5.2toconstructtheasymptoticcovariancematrix.WewillrequiretheassumptionsinSection12.4toobtainconvergenceofthemomentsandasymptoticnormality.Wewillwishtoextendthissimpleresultinoneinstance.Inthecommoncaseinwhichxtcontainslaggedval-uesofyt,wewillwanttouseaninstrumentalvariableestimator.WewillreturntothatestimationprobleminSection12.9.4.12.7TESTINGFORAUTOCORRELATIONTheavailabletestsforautocorrelationarebasedontheprinciplethatifthetruedisturbancesareautocorrelated,thenthisfactcanbedetectedthroughtheautocorre-lationsoftheleastsquaresresiduals.Thesimplestindicatoristheslopeintheartificialregressionet=ret−1+vt,e=y−xb.ttt
(12-19)TTr=eee2tt−1tt=2t=1Ifthereisautocorrelation,thentheslopeinthisregressionwillbeanestimatorofρ=Corr[εt,εt−1].Thecomplicationintheanalysisliesindeterminingaformalmeansofevaluatingwhentheestimatoris“large,”thatis,onwhatstatisticalbasistoreject\nGreene-50240bookJune17,200214:1CHAPTER12✦SerialCorrelation269thenullhypothesisthatρequalszero.Asafirstapproximation,treating(12-19)asaclassicallinearmodelandusingatorF(squaredt)testtotestthehypothesisisavalidwaytoproceedbasedontheLagrangemultiplierprinciple.WeusedthisdeviceinExample12.3.Thetestsweconsiderherearerefinementsofthisapproach.12.7.1LAGRANGEMULTIPLIERTESTTheBreusch(1978)–Godfrey(1978)testisaLagrangemultipliertestofH0:noauto-correlationversusH1:εt=AR(P)orεt=MA(P).Thesametestisusedforeitherstructure.TheteststatisticiseX(XX)−1Xe00002LM=T=TR0(12-20)eewhereX0istheoriginalXmatrixaugmentedbyPadditionalcolumnscontainingthelaggedOLSresiduals,et−1,...,et−P.Thetestcanbecarriedoutsimplybyregressingtheordinaryleastsquaresresidualsetonxt0(fillinginmissingvaluesforlaggedresidualswithzeros)andreferringTR2tothetabledcriticalvalueforthechi-squareddistribution0withPdegreesoffreedom.11SinceXe=0,thetestisequivalenttoregressingeonthetpartofthelaggedresidualsthatisunexplainedbyX.Thereisthereforeacompellinglogictoit;ifanyfitisfound,thenitisduetocorrelationbetweenthecurrentandlaggedresiduals.ThetestisajointtestofthefirstPautocorrelationsofεt,notjustthefirst.12.7.2BOXANDPIERCE’STESTANDLJUNG’SREFINEMENTAnalternativetestwhichisasymptoticallyequivalenttotheLMtestwhenthenullhypothesis,ρ=0,istrueandwhenXdoesnotcontainlaggedvaluesofyisduetoBoxandPierce(1970).TheQtestiscarriedoutbyreferringPQ=Tr2,(12-21)jj=1TT2whererj=(t=j+1etet−j)/(t=1et),tothecriticalvaluesofthechi-squaredtablewithPdegreesoffreedom.ArefinementsuggestedbyLjungandBox(1979)isPr2jQ=T(T+2).(12-22)T−jj=1TheessentialdifferencebetweentheGodfrey–BreuschandtheBox–Piercetestsistheuseofpartialcorrelations(controllingforXandtheothervariables)intheformerandsimplecorrelationsinthelatter.Underthenullhypothesis,thereisnoautocorrelationinεt,andnocorrelationbetweenxtandεsinanyevent,sothetwotestsareasymptoticallyequivalent.Ontheotherhand,sinceitdoesnotconditiononxt,the11Awarningtopractitioners:CurrentsoftwarevariesonwhetherthelaggedresidualsarefilledwithzerosorthefirstPobservationsaresimplydroppedwhencomputingthisstatistic.Intheinterestofreplicability,usersshoulddeterminewhichisthecasebeforereportingresults.\nGreene-50240bookJune17,200214:1270CHAPTER12✦SerialCorrelationBox–PiercetestislesspowerfulthantheLMtestwhenthenullhypothesisisfalse,asintuitionmightsuggest.12.7.3THEDURBIN–WATSONTESTTheDurbin–Watsonstatistic12wasthefirstformalproceduredevelopedfortestingforautocorrelationusingtheleastsquaresresiduals.TheteststatisticisT222t=2(et−et−1)e1+eTd==2(1−r)−(12-23)Te2Te2t=1tt=1twhereristhesamefirstorderautocorrelationwhichunderliestheprecedingtwostatis-tics.Ifthesampleisreasonablylarge,thenthelasttermwillbenegligible,leavingd≈2(1−r).Thestatistictakesthisformbecausetheauthorswereabletodeterminetheexactdistributionofthistransformationoftheautocorrelationandcouldprovidetablesofcriticalvalues.UseablecriticalvalueswhichdependonlyonTandKarepre-sentedintablessuchasthatattheendofthisbook.Theone-sidedtestforH0:ρ=0againstH1:ρ>0iscarriedoutbycomparingdtovaluesdL(T,K)anddU(T,K).IfddU,thehypothesisisnotrejected.IfdliesbetweendLanddU,thennoconclusionisdrawn.12.7.4TESTINGINTHEPRESENCEOFALAGGEDDEPENDENTVARIABLESTheDurbin–Watsontestisnotlikelytobevalidwhenthereisalaggeddependentvariableintheequation.13Thestatisticwillusuallybebiasedtowardafindingofnoautocorrelation.Threealternativeshavebeendevised.TheLMandQtestscanbeusedwhetherornottheregressioncontainsalaggeddependentvariable.Asanalternativetothestandardtest,Durbin(1970)derivedaLagrangemultipliertestthatisappropriateinthepresenceofalaggeddependentvariable.Thetestmaybecarriedoutbyreferringh=rT1−Tsc2,(12-24)wheres2istheestimatedvarianceoftheleastsquaresregressioncoefficientony,ct−1tothestandardnormaltables.LargevaluesofhleadtorejectionofH0.Thetesthasthevirtuesthatitcanbeusedeveniftheregressioncontainsadditionallagsofyt,anditcanbecomputedusingthestandardresultsfromtheinitialregressionwithoutanyfurtherregressions.Ifs2>1/T,however,thenitcannotbecomputed.Analternativecistoregressetonxt,yt−1,...,et−1,andanyadditionallagsthatareappropriateforetandthentotestthejointsignificanceofthecoefficient(s)onthelaggedresidual(s)withthestandardFtest.ThismethodisaminormodificationoftheBreusch–Godfreytest.UnderH0,thecoefficientsontheremainingvariableswillbezero,sothetestsarethesameasymptotically.12DurbinandWatson(1950,1951,1971).13ThisissuehasbeenstudiedbyNerloveandWallis(1966),Durbin(1970),andDezhbaksh(1990).\nGreene-50240bookJune17,200214:1CHAPTER12✦SerialCorrelation27112.7.5SUMMARYOFTESTINGPROCEDURESTheprecedinghasexaminedseveraltestingproceduresforlocatingautocorrelationinthedisturbances.Inallcases,theprocedureexaminestheleastsquaresresiduals.Wecansummarizetheproceduresasfollows:LMTestLM=TR2inaregressionoftheleastsquaresresidualson[x,e,...e].tt−1t−PRejectHifLM>χ2[P].Thistestexaminesthecovarianceoftheresidualswithlagged0∗values,controllingfortheinterveningeffectoftheindependentvariables.P22QTestQ=T(T−2)j=1rj/(T−j).RejectH0ifQ>χ∗[P].ThistestexaminestherawcorrelationsbetweentheresidualsandPlaggedvaluesoftheresiduals.Durbin–WatsonTestd=2(1−r),RejectH:ρ=0ifdF∗[P,T−K−P].Thistestexaminesthepartialcorrelationsbe-tweentheresidualsandthelaggedresiduals,controllingfortheinterveningeffectoftheindependentvariablesandthelaggeddependentvariable.TheDurbin–Watsontesthassomemajorshortcomings.TheinconclusiveregionislargeifTissmallormoderate.Theboundingdistributions,whilefreeoftheparametersβandσ,dodependonthedata(andassumethatXisnonstochastic).AnexactversionbasedonanalgorithmdevelopedbyImhof(1980)avoidstheinconclusiveregion,butisrarelyused.TheLMandBox–Piercestatisticsdonotsharetheseshortcomings—theirlimitingdistributionsarechi-squaredindependentlyofthedataandtheparameters.Forthisreason,theLMtesthasbecomethestandardmethodinappliedresearch.12.8EFFICIENTESTIMATIONWHENISKNOWNAsapreludetoderivingfeasibleestimatorsforβinthismodel,weconsiderfullgen-eralizedleastsquaresestimationassumingthatisknown.Inthenextsection,wewillturntothemorerealisticcaseinwhichmustbeestimatedaswell.Iftheparametersofareknown,thentheGLSestimator,βˆ=(X−1X)−1(X−1y),(12-25)andtheestimateofitssamplingvariance,Est.Var[βˆ]=σˆ2[X−1X]−1,(12-26)εwhere(y−Xβˆ)−1(y−Xβˆ)σˆ2=(12-27)εT\nGreene-50240bookJune17,200214:1272CHAPTER12✦SerialCorrelationcanbecomputedinonestep.FortheAR(1)case,dataforthetransformedmodelare1−ρ2y11−ρ2x1y2−ρy1x2−ρx1y∗=y3−ρy2,X∗=x3−ρx2.(12-28)......yT−ρyT−1xT−ρxT−1Thesetransformationsarevariouslylabeledpartialdifferences,quasidifferences,orpseudodifferences.Notethatinthetransformedmodel,everyobservationexceptthefirstcontainsaconstantterm.Whatwasthecolumnof1sinXistransformedto[(1−ρ2)1/2,(1−ρ),(1−ρ),...].Therefore,ifthesampleisrelativelysmall,thentheproblemswithmeasuresoffitnotedinSection3.5willreappear.ThevarianceofthetransformeddisturbanceisVar[ε−ρε]=Var[u]=σ2.tt−1tuThevarianceofthefirstdisturbanceisalsoσ2;[see(12-6)].Thiscanbeestimatedusingu(1−ρ2)σˆ2.εCorrespondingresultshavebeenderivedforhigher-orderautoregressiveprocesses.FortheAR(2)model,εt=θ1εt−1+θ2εt−2+ut,(12-29)thetransformeddataforgeneralizedleastsquaresareobtainedby1/2(1+θ)(1−θ)2−θ2221z∗1=z1,1−θ221/2(12-30)21/2θ11−θ1z∗2=1−θ2z2−z1,1−θ2z∗t=zt−θ1zt−1−θ2zt−2,t>2,whereztisusedforytorxt.Thetransformationbecomesprogressivelymorecomplexforhigher-orderprocesses.14NotethatinboththeAR(1)andAR(2)models,thetransformationtoy∗andX∗involves“startingvalues”fortheprocessesthatdependonlyonthefirstoneortwoobservations.Wecanviewtheprocessashavingbegunintheinfinitepast.SincethesamplecontainsonlyTobservations,however,itisconvenienttotreatthefirstoneortwo(orP)observationsasshownandconsiderthemas“initialvalues.”Whetherweviewtheprocessashavingbegunattimet=1orintheinfinitepastisultimatelyimmaterialinregardtotheasymptoticpropertiesoftheestimators.TheasymptoticpropertiesfortheGLSestimatorarequitestraightforwardgiventheapparatusweassembledinSection12.4.Webeginbyassumingthat{xt,εt}are14SeeBoxandJenkins(1984)andFuller(1976).\nGreene-50240bookJune17,200214:1CHAPTER12✦SerialCorrelation273jointlyanergodic,stationaryprocess.Then,aftertheGLStransformation,{x∗t,ε∗t}isalsostationaryandergodic.Moreover,ε∗tisnonautocorrelatedbyconstruction.Inthetransformedmodel,then,{w∗t}={x∗tε∗t}isastationaryandergodicmartingaledifferenceseries.WecanusetheErgodicTheoremtoestablishconsistencyandtheCentralLimitTheoremformartingaledifferencesequencestoestablishasymptoticnormalityforGLSinthismodel.Formalarrangementoftherelevantresultsisleftasanexercise.12.9ESTIMATIONWHENISUNKNOWNForanunknown,thereareavarietyofapproaches.Anyconsistentestimatorof(ρ)willsuffice—recallfromTheorem(10.8)inSection10.5.2,allthatisneededforefficientestimationofβisaconsistentestimatorof(ρ).Thecomplicationarises,asmightbeexpected,inestimatingtheautocorrelationparameter(s).12.9.1AR(1)DISTURBANCESTheAR(1)modelistheonemostwidelyusedandstudied.ThemostcommonprocedureistobeginFGLSwithanaturalestimatorofρ,theautocorrelationoftheresiduals.Sincebisconsistent,wecanuser.OthersthathavebeensuggestedincludeTheil’s(1971)estimator,r[(T−K)/(T−1)]andDurbin’s(1970),theslopeonyt−1inaregressionofytonyt−1,xtandxt−1.ThesecondstepisFGLSbasedon(12-25)–(12-28).ThisisthePraisandWinsten(1954)estimator.TheCochraneandOrcutt(1949)estimator(basedoncomputationalease)omitsthefirstobservation.Itispossibletoiterateanyoftheseestimatorstoconvergence.Sincetheestimatorisasymptoticallyefficientateveryiteration,nothingisgainedbydoingso.Unliketheheteroscedasticmodel,iteratingwhenthereisautocorrelationdoesnotproducethemaximumlikelihoodestimator.TheiteratedFGLSestimator,regardlessoftheestima-torofρ,doesnotaccountfortheterm(1/2)ln(1−ρ2)inthelog-likelihoodfunction[seethefollowing(12-31)].Maximumlikelihoodestimatorscanbeobtainedbymaximizingthelog-likelihoodwithrespecttoβ,σ2,andρ.Thelog-likelihoodfunctionmaybewrittenuT2u1TlnL=−t=1t+ln(1−ρ2)−ln2π+lnσ2,(12-31)2σ222uuwhere,asbefore,thefirstobservationiscomputeddifferentlyfromtheothersusing(12-28).Foragivenvalueofρ,themaximumlikelihoodestimatorsofβandσ2areutheusualones,GLSandthemeansquaredresidualusingthetransformeddata.Theproblemisestimationofρ.Onepossibilityistosearchtherange−1<ρ<1forthevaluethatwiththeimpliedestimatesoftheotherparametersmaximizeslnL.[ThisisHildrethandLu’s(1960)approach.]BeachandMacKinnon(1978a)arguethatthiswaytodothesearchisveryinefficientandhavedevisedamuchfasteralgorithm.Omittingthefirstobservationandaddinganapproximationatthelowerrightcornerproduces\nGreene-50240bookJune17,200214:1274CHAPTER12✦SerialCorrelationthestandardapproximationstotheasymptoticvariancesoftheestimators,βˆ2ˆ−1−1Est.Asy.VarML=σˆε,MLXMLX,Est.Asy.Varσˆ2=2σˆ4/T,(12-32)u,MLu,MLEst.Asy.Var[ρˆ]=1−ρˆ2T.MLMLAlltheforegoingestimatorshavethesameasymptoticproperties.Theavailableevi-denceontheirsmall-samplepropertiescomesfromMonteCarlostudiesandis,unfor-tunately,onlysuggestive.GrilichesandRao(1969)findevidencethatifthesampleisrelativelysmallandρisnotparticularlylarge,saylessthan0.3,thenleastsquaresisasgoodasorbetterthanFGLS.Theproblemistheadditionalvariationintroducedintothesamplingvariancebythevarianceofr.Beyondthese,theresultsarerathermixed.Maximumlikelihoodseemstoperformwellingeneral,butthePrais–Winstenestimatorisevidentlynearlyasefficient.Bothestimatorshavebeenincorporatedinallcontemporarysoftware.Inpractice,theBeachandMacKinnon’smaximumlikelihoodestimatorisprobablythemostcommonchoice.12.9.2AR(2)DISTURBANCESMaximumlikelihoodproceduresformostotherdisturbanceprocessesareexceedinglycomplex.BeachandMacKinnon(1978b)havederivedanalgorithmforAR(2)dis-turbances.Forhigher-orderautoregressivemodels,maximumlikelihoodestimationispresentlyimpractical,butthetwo-stepestimatorscaneasilybeextended.Formodelsoftheformεt=θ1εt−1+θ2εt−2+···+θpεt−p+ut,(12-33)asimpleapproachforestimationoftheautoregressiveparametersistousethefollow-ingmethod:Regressetonet−1,...,et−p,toobtainconsistentestimatesoftheautore-gressiveparameters.Withtheestimatesofρ1,...,ρpinhand,theCochrane–Orcuttestimatorcanbeobtained.IfthemodelisanAR(2),thefullFGLSprocedurecanbeusedinstead.Theleastsquarescomputationsforthetransformeddataprovide(atleastasymptotically)theappropriateestimatesofσ2andthecovariancematrixofβˆ.Asubefore,iterationispossiblebutbringsnogainsinefficiency.12.9.3APPLICATION:ESTIMATIONOFAMODELWITHAUTOCORRELATIONArestrictedversionofthemodelfortheU.S.gasolinemarketthatappearsinExam-ple12.2isGtItln=β1+β2lnPG,t+β3ln+β4lnPNC,t+β5lnPUC,t+εt.poptpoptTheresultsinFigure12.2suggestthatthespecificationabovemaybeincomplete,and,ifso,theremaybeautocorrelationinthedisturbanceinthisspecification.LeastsquaresestimationoftheequationproducestheresultsinthefirstrowofTable12.2.Thefirst5autocorrelationsoftheleastsquaresresidualsare0.674,0.207,−0.049,−0.159,and−0.158.ThisproducesBox–PierceandBox–Ljungstatisticsof19.816and21.788,respectively,bothofwhicharelargerthanthecriticalvaluefromthechi-squaredtableof11.07.Weregressedtheleastsquaresresidualsontheindependentvariablesand\nGreene-50240bookJune17,200214:1CHAPTER12✦SerialCorrelation275TABLE12.2ParameterEstimates(StandardErrorsinParentheses)β1β2β3β4β5ρOLS−7.736−0.05911.373−0.127−0.1190.000R2=0.95799(0.674)(0.0325)(0.0756)(0.127)(0.0813)(0.000)Prais–−6.782−0.1521.267−0.0308−0.06380.862Winsten(−0.955)(0.0370)(0.107)(0.127)(0.0758)(0.0855)Cochrane–−7.147−0.1491.307−0.0599−0.05630.849Orcutt(1.297)(0.0382)(0.144)(0.146)(0.0789)(−.0893)Maximum−5.159−0.2081.08280.0878−0.03510.930Likelihood(1.132)(0.0349)(0.127)(0.125)(0.0659)(0.0620)AR(2)−11.828−0.03101.415−0.192−0.1140.760(0.888)(0.0292)(0.0682)(0.133)(0.0846)(r1)θ1=0.9936319,θ2=−4620284fivelagsoftheresiduals.Thecoefficientsonthelaggedresidualsandtheassociatedtstatisticsare1.075(5.493),−0.712(−2.488),0.310(0.968),−0.227(−0.758),0.000096(0.000).TheR2inthisregressionis0.598223,whichproducesachi-squaredvalueof21.536.Theconclusionisthesame.Finally,theDurbin–Watsonstatisticis0.60470.Forfourregressorsand36observations,thecriticalvalueofdlis1.24,soonthisbasisaswell,thehypothesisρ=0wouldberejected.TheplotoftheresidualsshowninFigure12.4seemsconsistentwiththisconclusion.ThePraisandWinstenFGLSestimatesappearinthesecondrowofTable12.4,followedbytheCochraneandOrcuttresultsthenthemaximumlikelihoodestimates.FIGURE12.4LeastSquaresResiduals.LeastSquaresResiduals.075.050.025E.000.025.050.075195919641969197419791984198919941999Year\nGreene-50240bookJune17,200214:1276CHAPTER12✦SerialCorrelationIneachofthesecases,theautocorrelationcoefficientisreestimatedusingtheFGLSresiduals.Thisrecomputedvalueiswhatappearsinthetable.OnemightwanttoexaminetheresidualsafterestimationtoascertainwhethertheAR(1)modelisappropriate.Intheresultsabove,therearetwolargeautocorrelationcoefficientslistedwiththeresidualbasedtests,andincomputingtheLMstatistic,wefoundthatthefirsttwocoefficientswerestatisticallysignificant.IftheAR(1)modelisappropriate,thenoneshouldfindthatonlythecoefficientonthefirstlaggedresidualisstatisticallysignificantinthisauxiliary,secondstepregression.AnotherindicatorisprovidedbytheFGLSresiduals,themselves.AftercomputingtheFGLSregression,theestimatedresiduals,εˆ=y−xβˆttwillstillbeautocorrelated.InourresultsusingthePrais–Winstenestimates,theauto-correlationoftheFGLSresidualsis0.865.TheassociatedDurbin–Watsonstatisticis0.278.Thisistobeexpected.However,ifthemodeliscorrect,thenthetransformedresidualsuˆt=εˆt−ρˆεˆt−1shouldbeatleastclosetononautocorrelated.But,forourdata,theautocorrelationoftheadjustedresidualsis0.438withaDurbin–Watsonstatisticof1.125.Itappearsonthisbasisthat,infact,theAR(1)modelhasnotcompletedthespecification.TheresultsnotedearliersuggestthatanAR(2)processmightbettercharacterizethedisturbancesinthismodel.Simpleregressionoftheleastsquaresresidualsonaconstantandtwolaggedvalues(thetwoperiodcounterparttoamethodofobtainingrintheAR(1)model)producesslopecoefficientsof0.9936319and−0.4620284.15TheGLStransformationsfortheAR(2)modelaregivenin(12-30).WerecomputedtheregressionusingtheAR(2)transformationandthesetwocoefficients.ThesearethefinalresultsshowninTable12.2.Theydobringasubstantialchangeintheresults.Asanadditionalcheckontheadequacyofthemodel,wenowcomputedthecorrectedFGLSresidualsfromtheAR(2)model,uˆt=εˆt−θˆ1εˆt−1−θˆ2εˆt−2Thefirstfiveautocorrelationsoftheseresidualsare0.132,0.134,0.016,0.022,and−0.118.TheBox–PierceandBox–Ljungstatisticsare1.605and1.857,whicharefarfromsta-tisticallysignificant.WethusconcludethattheAR(2)modelaccountsfortheautocor-relationinthedata.Theprecedingsuggestshowonemightdiscovertheappropriatemodelforauto-correlationinaregressionmodel.However,itisworthkeepinginmindthatthesourceoftheautocorrelationmightitselfbediscernibleinthedata.ThefindingofanAR(2)processmaystillsuggestthattheregressionspecificationisincompleteorinadequateinsomeway.15InfittinganAR(1)model,thestationarityconditionisobvious;|r|mustbelessthanone.ForanAR(2)process,theconditionislessthanobvious.WewillexaminethisissueinChapter20.Forthepresent,wemerelystatetheresult;thetwovalues(1/2)[θ1±(θ2+4θ2)1/2]mustbelessthanoneinabsolutevalue.Since1theterminparenthesesmightbenegative,the“roots”mightbeacomplexpaira±bi,inwhichcasea2+b2mustbelessthanone.Youcanverifythatthetwocomplexrootsforourprocessaboveareindeed“insidetheunitcircle.”\nGreene-50240bookJune17,200214:1CHAPTER12✦SerialCorrelation27712.9.4ESTIMATIONWITHALAGGEDDEPENDENTVARIABLEInSection12.5.1,weconsideredtheproblemofestimationbyleastsquareswhenthemodelcontainsbothautocorrelationandlaggeddependentvariable(s).SincetheOLSestimatorisinconsistent,theresidualsonwhichanestimatorofρwouldbebasedarelikewiseinconsistent.Therefore,ρˆwillbeinconsistentaswell.TheconsequenceisthattheFGLSestimatorsdescribedearlierarenotusableinthiscase.Thereis,however,analternativewaytoproceed,basedonthemethodofinstrumentalvariables.ThemethodofinstrumentalvariableswasintroducedinSection5.4.Toreview,thegeneralproblemisthatintheregressionmodel,ifplim(1/T)Xε=0,thentheleastsquaresestimatorisnotconsistent.Aconsistentestimatorisb=(ZX)−1(Zy),IVwhereZissetofKvariableschosensuchthatplim(1/T)Zε=0butplim(1/T)ZX=0.Forthepurposeofconsistencyonly,anysuchsetofinstrumentalvariableswillsuffice.TherelevanceofthathereisthattheobstacletoconsistentFGLSis,atleastforthepresent,isthelackofaconsistentestimatorofρ.Byusingthetechniqueofinstrumentalvariables,wemayestimateβconsistently,thenestimateρandproceed.Hatanaka(1974,1976)hasdevisedanefficienttwo-stepestimatorbasedonthisprin-ciple.Toputtheestimatorinthecurrentcontext,weconsiderestimationofthemodely=xβ+γy+ε,ttt−1tεt=ρεt−1+ut.TogettothesecondstepofFGLS,werequireaconsistentestimatoroftheslopepa-rameters.TheseestimatescanbeobtainedusinganIVestimator,wherethecolumnofZcorrespondingtoyt−1istheonlyonethatneedbedifferentfromthatofX.Anappropriateinstrumentcanbeobtainedbyusingthefittedvaluesintheregressionofytonxtandxt−1.TheresidualsfromtheIVregressionarethenusedtoconstructTt=3εˆtεˆt−1ρˆ=,Tεˆ2t=3twhereεˆ=y−bx−cy.ttIVtIVt−1FGLSestimatesmaynowbecomputedbyregressingy∗t=yt−ρˆyt−1onx∗t=xt−ρˆxt−1,y∗t−1=yt−1−ρˆyt−2,εˆ=y−bx−cy.t−1t−1IVt−1IVt−2Letdbethecoefficientonεˆt−1inthisregression.Theefficientestimatorofρisρˆˆ=ρˆ+d.Appropriateasymptoticstandarderrorsfortheestimators,includingρˆˆ,areobtainedfromthes2[XX]−1computedatthesecondstep.Hatanakashowsthattheseestimators∗∗areasymptoticallyequivalenttomaximumlikelihoodestimators.\nGreene-50240bookJune17,200214:1278CHAPTER12✦SerialCorrelation12.10COMMONFACTORSWesawinExample12.2thatmisspecificationofanequationcouldcreatetheappear-anceofseriallycorrelateddisturbanceswhen,infact,therearenone.Anorthodox(perhapssomewhatoptimistic)puristmightarguethatautocorrelationisalwaysanartifactofmisspecification.Althoughthisviewmightbeextreme[see,e.g.,Hendry(1980)foramoremoderate,butstillstridentstatement],itdoessuggestausefulpoint.Itmightbeusefulifwecouldexaminethespecificationofamodelstatisticallywiththisconsiderationinmind.Thetestforcommonfactorsissuchatest.[See,aswell,theaforementionedpaperbyMizon(1995).]Theassumptionthatthecorrectlyspecifiedmodelisy=xβ+ε,ε=ρε+u,t=1,...,Tttttt−1timpliesthe“reducedform,”M:y=ρy+(x−ρx)β+u,t=2,...,T,0tt−1tt−1twhereutisfreefromserialcorrelation.ThesecondoftheseisactuallyarestrictiononthemodelM:y=ρy+xβ+xα+u,t=2,...,T,1tt−1tt−1tinwhich,onceagain,utisaclassicaldisturbance.Thesecondmodelcontains2K+1parameters,butifthemodeliscorrect,thenα=−ρβandthereareonlyK+1para-metersandKrestrictions.BothM0andM1canbeestimatedbyleastsquares,althoughM0isanonlinearmodel.OnemightthentesttherestrictionsofM0usinganFtest.Thistestwillbevalidasymptotically,althoughitsexactdistributioninfinitesampleswillnotbepreciselyF.Inlargesamples,KFwillconvergetoachi-squaredstatistic,soweusetheFdistributionasusualtobeconservative.Thereisaminorpracticalcomplicationinimplementingthistest.Someelementsofαmaynotbeestimable.Forexample,ifxtcontainsaconstantterm,thentheoneinαisunidentified.Ifxtcontainsbothcurrentandlaggedvaluesofavariable,thentheoneperiodlaggedvaluewillappeartwiceinM1,onceinxtasthelaggedvalueandonceinxt−1asthecurrentvalue.Thereareothercombinationsthatwillbeproblematic,sotheactualnumberofrestrictionsthatappearinthetestisreducedtothenumberofidentifiedparametersinα.Example12.5TestsforCommonFactorsWewillexaminethegasolinedemandmodelofExample12.2andconsiderasimplifiedversionoftheequationGtItln=β1+β2lnPG,t+β3ln+β4lnPNC,t+β5lnPUC,t+εt.poptpoptIftheAR(1)modelisappropriateforεt,thentherestrictedmodel,GtItIt−1ln=β1+β2(lnPG,t−ρlnPG,t−1)+β3ln−ρlnpoptpoptpopt−1+β4(lnPNC,t−ρlnPNC,t−1)+β5(lnPUC,t−ρlnPUC,t−1)+ρlnGt−1/popt−1+ut,withsixfreecoefficientswillnotsignificantlydegradethefitoftheunrestrictedmodel,whichhas10freecoefficients.TheFstatistic,with4and25degreesoffreedom,forthistestequals\nGreene-50240bookJune17,200214:1CHAPTER12✦SerialCorrelation2794.311,whichislargerthanthecriticalvalueof2.76.Thus,wewouldconcludethattheAR(1)modelwouldnotbeappropriateforthisspecificationandthesedata.NotethatwereachedthesameconclusionafteramoreconventionalanalysisoftheresidualsintheapplicationinSection12.9.3.12.11FORECASTINGINTHEPRESENCEOFAUTOCORRELATIONForpurposesofforecasting,wereferfirsttothetransformedmodel,y=xβ+ε.∗t∗t∗tSupposethattheprocessgeneratingεtisanAR(1)andthatρisknown.Sincethismodelisaclassicalregressionmodel,theresultsofSection6.6maybeused.Theoptimalforecastofy0,givenx0andx(i.e.,x0=x0−ρx),is∗T+1T+1T∗T+1T+1Tyˆ0=x0βˆ.∗T+1∗T+1Disassemblingyˆ0,wefindthat∗T+1yˆ0−ρy=x0βˆ−ρxβˆT+1TT+1Toryˆ0=x0βˆ+ρ(y−xβˆ)T+1T+1TT(12-34)=x0βˆ+ρe.T+1TThus,wecarryforwardaproportionρoftheestimateddisturbanceintheprecedingperiod.ThisstepcanbejustifiedbyreferencetoE[εT+1|εT]=ρεT.Itcanalsobeshownthattoforecastnperiodsahead,wewoulduseyˆ0=x0βˆ+ρne.T+nT+nTTheextensiontohigher-orderautoregressionsisdirect.Forasecond-ordermodel,forexample,yˆ0=βˆx0+θe+θe.(12-35)T+nT+n1T+n−12T+n−2Forresidualsthatareoutsidethesampleperiod,weusetherecursiones=θ1es−1+θ2es−2,(12-36)beginningwiththelasttworesidualswithinthesample.Movingaveragemodelsaresomewhatsimpler,astheautocorrelationlastsforonlyQperiods.ForanMA(1)model,forthefirstpostsampleperiod,00βˆ+εˆyˆT+1=xT+1T+1,whereεˆT+1=uˆT+1−λuˆT.\nGreene-50240bookJune17,200214:1280CHAPTER12✦SerialCorrelationTherefore,aforecastofεT+1willuseallpreviousresiduals.OnewaytoproceedistoaccumulateεˆT+1fromtherecursionuˆt=εˆt+λuˆt−1withuˆ=uˆ=0andεˆ=(y−xβˆ).Afterthefirstpostsampleperiod,T+10tttεˆT+n=uˆT+n−λuˆT+n−1=0.Iftheparametersofthedisturbanceprocessareknown,thenthevariancesfortheforecasterrorscanbecomputedusingtheresultsofSection6.6.ForanAR(1)disturbance,theestimatedvariancewouldbes2=σˆ2+(x−ρx)Est.Var[βˆ](x−ρx).(12-37)fεtt−1tt−1Forahigher-orderprocess,itisonlynecessarytomodifythecalculationofx∗taccord-ingly.TheforecastvariancesforanMA(1)processaresomewhatmoreinvolved.DetailsmaybefoundinJudgeetal.(1985)andHamilton(1994).Iftheparametersofthedis-turbanceprocess,ρ,λ,θj,andsoon,areestimatedaswell,thentheforecastvariancewillbegreater.ForanAR(1)model,thenecessarycorrectiontotheforecastvarianceofthen-period-aheadforecasterrorisσˆ2n2ρ2(n−1)/T.[Foraone-period-aheadforecast,εthismerelyaddsaterm,σˆ2/T,inthebracketsin(12-36)].Higher-orderARandMAεprocessesareanalyzedinBaillie(1979).Finally,iftheregressorsarestochastic,theexpressionsbecomemorecomplexbyanotherorderofmagnitude.Ifρisknown,then(12-34)providesthebestlinearunbiasedforecastofy.16t+1If,however,ρmustbeestimated,thenthisassessmentmustbemodified.Thereisinformationaboutεt+1embodiedinet.Havingtoestimateρ,however,impliesthatsomeorallthevalueofthisinformationisoffsetbythevariationintroducedintotheforecastbyincludingthestochasticcomponentρˆe.17Whether(12-34)ispreferabletottheobviousexpedientyˆ0=βˆx0inasmallsamplewhenρisestimatedremainstoT+nT+nbesettled.12.12SUMMARYANDCONCLUSIONSThischapterhasexaminedthegeneralizedregressionmodelwithserialcorrelationinthedisturbances.Webeganwithsomegeneralresultsonanalysisoftime-seriesdata.Whenweconsiderdependentobservationsandserialcorrelation,thelawsoflargenum-bersandcentrallimittheoremsusedtoanalyzeindependentobservationsnolongersuffice.Wepresentedsomeusefultoolswhichextendtheseresultstotimeseriesset-tings.Wethenconsideredestimationandtestinginthepresenceofautocorrelation.Asusual,OLSisconsistentbutinefficient.TheNewey–Westestimatorisarobustestima-torfortheasymptoticcovariancematrixoftheOLSestimator.ThispairofestimatorsalsoconstitutetheGMMestimatorfortheregressionmodelwithautocorrelation.Wethenconsideredtwo-stepfeasiblegeneralizedleastsquaresandmaximumlikelihoodestimationforthespecialcaseusuallyanalyzedbypractitioners,theAR(1)model.The16SeeGoldberger(1962).17SeeBaillie(1979).\nGreene-50240bookJune17,200214:1CHAPTER12✦SerialCorrelation281modelwithacorrectionforautocorrelationisarestrictiononamoregeneralmodelwithlaggedvaluesofbothdependentandindependentvariables.Weconsideredameansoftestingthisspecificationasanalternativeto“fixing”theproblemofautocorrelation.KeyTermsandConcepts•AR(1)•ErgodicTheorem•Partialdifference•Asymptoticnegligibility•First-orderautoregression•Prais–Winstenestimator•Asymptoticnormality•Expectationsaugmented•Pseudodifferences•AutocorrelationPhillipscurve•Qtest•Autocorrelationmatrix•GMMestimator•Quasidifferences•Autocovariance•Initialconditions•Stationarity•Autocovariancematrix•Innovation•Summability•Autoregressiveform•Lagrangemultipliertest•Time-seriesprocess•Cochrane–Orcuttestimator•Martingalesequence•Timewindow•Commonfactormodel•Martingaledifference•Weaklystationary•Covariancestationaritysequence•Whitenoise•Durbin–Watsontest•Movingaverageform•YuleWalkerequations•Ergodicity•MovingaverageprocessExercises1.Doesfirstdifferencingreduceautocorrelation?Considerthemodelsyt=βxt+εt,whereεt=ρεt−1+utandεt=ut−λut−1.Comparetheautocorrelationofεtintheoriginalmodelwiththatofvtinyt−yt−1=β(xt−xt−1)+vt,wherevt=εt−εt−1.2.Derivethedisturbancecovariancematrixforthemodelyt=βxt+εt,εt=ρεt−1+ut−λut−1.WhatparameterisestimatedbytheregressionoftheOLSresidualsontheirlaggedvalues?3.Thefollowingregressionisobtainedbyordinaryleastsquares,using21observa-tions.(Estimatedasymptoticstandarderrorsareshowninparentheses.)yt=1.3+0.97yt−1+2.31xt,D−W=1.21.(0.3)(0.18)(1.04)Testforthepresenceofautocorrelationinthedisturbances.4.ItiscommonlyassertedthattheDurbin–Watsonstatisticisonlyappropriatefortestingforfirst-orderautoregressivedisturbances.Whatcombinationofthecoef-ficientsofthemodelisestimatedbytheDurbin–Watsonstatisticineachofthefollowingcases:AR(1),AR(2),MA(1)?Ineachcase,assumethattheregressionmodeldoesnotcontainalaggeddependentvariable.Commentontheimpactonyourresultsofrelaxingthisassumption.5.ThedatausedtofittheexpectationsaugmentedPhillipscurveinExample12.3aregiveninTableF5.1.Usingthesedata,reestimatethemodelgivenintheexample.CarryoutaformaltestforfirstorderautocorrelationusingtheLMstatistic.Then,reestimatethemodelusinganAR(1)modelforthedisturbanceprocess.Sincethesampleislarge,thePrais–WinstenandCochrane–Orcuttestimatorsshould\nGreene-50240bookJune17,200214:1282CHAPTER12✦SerialCorrelationgiveessentiallythesameanswer.Dothey?Afterfittingthemodel,obtainthetransformedresidualsandexaminethemforfirstorderautocorrelation.DoestheAR(1)modelappeartohaveadequately“fixed”theproblem?6.DataforfittinganimprovedPhillipscurvemodelcanbeobtainedfrommanysources,includingtheBureauofEconomicAnalysis’s(BEA)ownwebsite,Econo-magic.com,andsoon.Obtainthenecessarydataandexpandthemodelofexam-ple12.3.DoesaddingadditionalexplanatoryvariablestothemodelreducetheextremepatternoftheOLSresidualsthatappearsinFigure12.3?\nGreene-50240bookJune18,200215:2813MODELSFORPANELDATAQ13.1INTRODUCTIONDatasetsthatcombinetimeseriesandcrosssectionsarecommonineconomics.Forexample,thepublishedstatisticsoftheOECDcontainnumerousseriesofeconomicaggregatesobservedyearlyformanycountries.Recentlyconstructedlongitudinaldatasetscontainobservationsonthousandsofindividualsorfamilies,eachobservedatseveralpointsintime.Otherempiricalstudieshaveanalyzedtime-seriesdataonsetsoffirms,states,countries,orindustriessimultaneously.Thesedatasetsproviderichsourcesofinformationabouttheeconomy.Modelinginthissetting,however,callsforsomecomplexstochasticspecifications.Inthischapter,wewillsurveythemostcommonlyusedtechniquesfortime-seriescross-sectiondataanalysesinsingleequationmodels.13.2PANELDATAMODELSManyrecentstudieshaveanalyzedpanel,orlongitudinal,datasets.TwoveryfamousonesaretheNationalLongitudinalSurveyofLaborMarketExperience(NLS)andtheMichiganPanelStudyofIncomeDynamics(PSID).Inthesedatasets,verylargecrosssections,consistingofthousandsofmicrounits,arefollowedthroughtime,butthenumberofperiodsisoftenquitesmall.ThePSID,forexample,isastudyofroughly6,000familiesand15,000individualswhohavebeeninterviewedperiodicallyfrom1968tothepresent.Anothergroupofintensivelystudiedpaneldatasetswerethosefromthenegativeincometaxexperimentsoftheearly1970sinwhichthousandsoffamilieswerefollowedfor8or13quarters.Constructinglong,evenlyspacedtimeseriesincontextssuchasthesewouldbeprohibitivelyexpensive,butforthepurposesforwhichthesedataaretypicallyused,itisunnecessary.Timeeffectsareoftenviewedas“transitions”ordiscretechangesofstate.Theyaretypicallymodeledasspecifictotheperiodinwhichtheyoccurandarenotcarriedacrossperiodswithinacross-sectionalunit.1Paneldatasetsaremoreorientedtowardcross-sectionanalyses;theyarewidebuttypicallyshort.Heterogeneityacrossunitsisanintegralpart—indeed,oftenthecentralfocus—oftheanalysis.1Theoristshavenotbeendeterredfromdevisingautocorrelationmodelsapplicabletopaneldatasets;though.See,forexample,Lee(1978)orPark,Sickles,andSimar(2000).Asapracticalmatter,however,theempiricalliteratureinthisfieldhasfocusedoncross-sectionalvariationandlessintricatetimeseriesmodels.Formaltime-seriesmodelingofthesortdiscussedinChapter12issomewhatunusualintheanalysisoflongitudinaldata.283\nGreene-50240bookJune18,200215:28284CHAPTER13✦ModelsforPanelDataTheanalysisofpanelorlongitudinaldataisthesubjectofoneofthemostactiveandinnovativebodiesofliteratureineconometrics,2partlybecausepaneldataprovidesucharichenvironmentforthedevelopmentofestimationtechniquesandtheoreticalresults.Inmorepracticalterms,however,researchershavebeenabletousetime-seriescross-sectionaldatatoexamineissuesthatcouldnotbestudiedineithercross-sectionalortime-seriessettingsalone.Twoexamplesareasfollows.1.Inawidelycitedstudyoflaborsupply,Ben-Porath(1973)observesthatatacertainpointintime,inacohortofwomen,50percentmayappeartobeworking.Itisambiguouswhetherthisfindingimpliesthat,inthiscohort,one-halfofthewomenonaveragewillbeworkingorthatthesameone-halfwillbeworkingineveryperiod.Thesehaveverydifferentimplicationsforpolicyandfortheinterpretationofanystatisticalresults.Cross-sectionaldataalonewillnotshedanylightonthequestion.2.Along-standingproblemintheanalysisofproductionfunctionshasbeentheinabilitytoseparateeconomiesofscaleandtechnologicalchange.3Cross-sectionaldataprovideinformationonlyabouttheformer,whereastime-seriesdatamuddlethetwoeffects,withnoprospectofseparation.Itiscommon,forexample,toassumeconstantreturnstoscalesoastorevealthetechnicalchange.4Ofcourse,thispracticeassumesawaytheproblem.Apanelofdataoncostsoroutputforanumberoffirmseachobservedoverseveralyearscanpro-videestimatesofboththerateoftechnologicalchange(astimeprogresses)andeconomiesofscale(forthesampleofdifferentsizedfirmsateachpointintime).Inprinciple,themethodsofChapter12canbeappliedtolongitudinaldatasets.Inthetypicalpanel,however,therearealargenumberofcross-sectionalunitsandonlyafewperiods.Thus,thetime-seriesmethodsdiscussedtheremaybesomewhatproblematic.Recentworkhasgenerallyconcentratedonmodelsbettersuitedtotheseshortandwidedatasets.Thetechniquesarefocusedoncross-sectionalvariation,orheterogeneity.Inthischapter,weshallexamineindetailthemostwidelyusedmodelsandlookbrieflyatsomeextensions.Thefundamentaladvantageofapaneldatasetoveracrosssectionisthatitwillallowtheresearchergreatflexibilityinmodelingdifferencesinbehavioracrossindividuals.2Thepaneldataliteraturerivalsthereceivedresearchonunitrootsandcointegrationineconometricsinitsrateofgrowth.AcompendiumoftheearliestliteratureisMaddala(1993).Book-lengthsurveysontheeconometricsofpaneldataincludeHsiao(1986),Dielman(1989),MatyasandSevestre(1996),RajandBaltagi(1992),andBaltagi(1995).Therearealsolengthysurveysdevotedtospecifictopics,suchaslimiteddependentvariablemodels[Hsiao,Lahiri,Lee,andPesaran(1999)]andsemiparametricmethods[Lee(1998)].AnextensivebibliographyisgiveninBaltagi(1995).3ThedistinctionbetweenthesetwoeffectsfiguredprominentlyinthepolicyquestionofwhetheritwasappropriatetobreakuptheAT&TCorporationinthe1980sand,ultimately,toallowcompetitionintheprovisionoflong-distancetelephoneservice.4Inaclassicstudyofthisissue,Solow(1957)states:“FromtimeseriesofQ/Q,wK,K/K,wLandL/Lortheirdiscreteyear-to-yearanalogues,wecouldestimateA/AandthenceA(t)itself.Actuallyanamusingthinghappenshere.Nothinghasbeensaidsofaraboutreturnstoscale.ButifallfactorinputsareclassifiedeitherasKorL,thentheavailablefiguresalwaysshowwKandwLaddinguptoone.Sincewehaveassumedthatfactorsarepaidtheirmarginalproducts,thisamountstoassumingthehypothesisofEuler’stheorem.Thecalculusbeingwhatitis,wemightjustaswellassumetheconclusion,namely,theFishomogeneousofdegreeone.”\nGreene-50240bookJune18,200215:28CHAPTER13✦ModelsforPanelData285Thebasicframeworkforthisdiscussionisaregressionmodeloftheformy=xβ+zα+ε.(13-1)ititiitThereareKregressorsinxit,notincludingaconstantterm.Theheterogeneity,orindividualeffectiszαwherezcontainsaconstanttermandasetofindividualoriigroupspecificvariables,whichmaybeobserved,suchasrace,sex,location,andsoonorunobserved,suchasfamilyspecificcharacteristics,individualheterogeneityinskillorpreferences,andsoon,allofwhicharetakentobeconstantovertimet.Asitstands,thismodelisaclassicalregressionmodel.Ifziisobservedforallindividuals,thentheentiremodelcanbetreatedasanordinarylinearmodelandfitbyleastsquares.Thevariouscaseswewillconsiderare:1.PooledRegression:Ifzicontainsonlyaconstantterm,thenordinaryleastsquaresprovidesconsistentandefficientestimatesofthecommonαandtheslopevectorβ.2.FixedEffects:Ifziisunobserved,butcorrelatedwithxit,thentheleastsquaresestimatorofβisbiasedandinconsistentasaconsequenceofanomittedvariable.However,inthisinstance,themodely=xβ+α+ε,ititiitwhereα=zα,embodiesalltheobservableeffectsandspecifiesanestimablecondi-iitionalmean.Thisfixedeffectsapproachtakesαitobeagroup-specificconstanttermintheregressionmodel.Itshouldbenotedthattheterm“fixed”asusedhereindicatesthatthetermdoesnotvaryovertime,notthatitisnonstochastic,whichneednotbethecase.3.RandomEffects:Iftheunobservedindividualheterogeneity,howeverformulated,canbeassumedtobeuncorrelatedwiththeincludedvariables,thenthemodelmaybeformulatedasy=xβ+E[zα]+zα−E[zα]+εititiiiit=xβ+α+u+ε,itiitthatis,asalinearregressionmodelwithacompounddisturbancethatmaybecon-sistently,albeitinefficiently,estimatedbyleastsquares.Thisrandomeffectsapproachspecifiesthatuiisagroupspecificrandomelement,similartoεitexceptthatforeachgroup,thereisbutasingledrawthatenterstheregressionidenticallyineachperiod.Again,thecrucialdistinctionbetweenthesetwocasesiswhethertheunobservedindi-vidualeffectembodieselementsthatarecorrelatedwiththeregressorsinthemodel,notwhethertheseeffectsarestochasticornot.Wewillexaminethisbasicformulation,thenconsideranextensiontoadynamicmodel.4.RandomParameters:Therandomeffectsmodelcanbeviewedasaregressionmodelwitharandomconstantterm.Withasufficientlyrichdataset,wemayextendthisideatoamodelinwhichtheothercoefficientsvaryrandomlyacrossindividualsaswell.Theextensionofthemodelmightappearasy=x(β+h)+(α+u)+ε,ititiiitwherehiisarandomvectorwhichinducesthevariationoftheparametersacross\nGreene-50240bookJune18,200215:28286CHAPTER13✦ModelsforPanelDataindividuals.Thisrandomparametersmodelwasproposedquiteearlyinthisliterature,buthasonlyfairlyrecentlyenjoyedwidespreadattentioninseveralfields.Itrepresentsanaturalextensioninwhichresearchersbroadentheamountofheterogeneityacrossindividualswhileretainingsomecommonalities—theparametervectorsstillshareacommonmean.Somerecentapplicationshaveextendedthisyetanotherstepbyallow-ingthemeanvalueoftheparameterdistributiontobeperson-specific,asiny=x(β+z+h)+(α+u)+ε,ititiiiitwhereziisasetofobservable,personspecificvariables,andisamatrixofparameterstobeestimated.Aswewillexaminelater,thishierarchicalmodelisextremelyversatile.5.CovarianceStructures:Lastly,wewillreconsiderthesourceoftheheterogeneityinthemodel.Insomesettings,researchershaveconcludedthatapreferableapproachtomodelingheterogeneityintheregressionmodelistolayeritintothevariationaroundtheconditionalmean,ratherthanintheplacementofthemean.Inacross-countrycomparisonofeconomicperformanceovertime,Alvarez,Garrett,andLange(1991)estimatedamodeloftheformyit=f(labororganizationit,politicalorganizationit)+εitinwhichtheregressionfunctionwasfullyspecifiedbythelinearpart,xβ+α,butitthevarianceofεitdifferedacrosscountries.Becketal.(1993)foundevidencethatthesubstantiveconclusionsofthestudyweredependentonthestochasticspecificationandonthemethodsusedforestimation.Example13.1CostFunctionforAirlineProductionToillustratethecomputationsforthevariouspaneldatamodels,wewillrevisittheairlinecostdatausedinExample7.2.ThisisapaneldatastudyofagroupofU.S.airlines.Wewillfitasimplemodelforthetotalcostofproduction:lncostit=β1+β2lnoutputit+β3lnfuelpriceit+β4loadfactorit+εit.Outputismeasuredin“revenuepassengermiles.”Theloadfactorisarateofcapacityutilization;itistheaveragerateatwhichseatsontheairline’splanesarefilled.Morecompletemodelsofcostsincludeotherfactorprices(materials,capital)and,perhaps,aquadraticterminlogoutputtoallowforvariableeconomiesofscale.Wehaverestrictedthecostfunctiontothesefewvariablestoprovideastraightforwardillustration.Ordinaryleastsquaresregressionproducesthefollowingresults.Estimatedstandarderrorsaregiveninparentheses.lncostit=9.5169(0.22924)+0.88274(0.013255)lnoutputit+0.45398(0.020304)lnfuelpriceit−1.62751(0.34540)loadfactorit+εit22R=0.9882898,s=0.015528,ee=1.335442193.Theresultssofararewhatonemightexpect.Therearesubstantialeconomiesofscale;e.s.it=(1/0.88274)−1=0.1329.Thefuelpriceandloadfactorsaffectcostsinthepre-dictablefashionsaswell.(Fuelpricesdifferbecauseofdifferentmixesoftypesofplanesandregionaldifferencesinsupplycharacteristics.)\nGreene-50240bookJune18,200215:28CHAPTER13✦ModelsforPanelData28713.3FIXEDEFFECTSThisformulationofthemodelassumesthatdifferencesacrossunitscanbecapturedindifferencesintheconstantterm.5Eachαistreatedasanunknownparametertobeiestimated.LetyiandXibetheTobservationsfortheithunit,ibeaT×1columnofones,andletεibeassociatedT×1vectorofdisturbances.Then,yi=Xiβ+iαi+εi.Collectingthesetermsgivesy1X1i0···0α1ε1y2X20i···0α2ε2.=.β+....+.........ynXn00···iαnεnorβy=[Xd1d2...dn]+ε,(13-2)αwherediisadummyvariableindicatingtheithunit.LetthenT×nmatrixD=[d1d2...dn].Then,assemblingallnTrowsgivesy=Xβ+Dα+ε.(13-3)Thismodelisusuallyreferredtoastheleastsquaresdummyvariable(LSDV)model(althoughthe“leastsquares”partofthenamereferstothetechniqueusuallyusedtoestimateit,nottothemodel,itself).Thismodelisaclassicalregressionmodel,sononewresultsareneededtoanalyzeit.Ifnissmallenough,thenthemodelcanbeestimatedbyordinaryleastsquareswithKregressorsinXandncolumnsinD,asamultipleregressionwithK+nparameters.Ofcourse,ifnisthousands,asistypical,thenthismodelislikelytoexceedthestoragecapacityofanycomputer.But,byusingfamiliarresultsforapartitionedregression,wecanreducethesizeofthecomputation.6Wewritetheleastsquaresestimatorofβasb=[XMX]−1[XMy],(13-4)DDwhereM=I−D(DD)−1D.DThisamountstoaleastsquaresregressionusingthetransformeddataX∗=MDXand5Itisalsopossibletoallowtheslopestovaryacrossi,butthismethodintroducessomenewmethodologicalissues,aswellasconsiderablecomplexityinthecalculations.AstudyonthetopicisCornwellandSchmidt(1984).Also,theassumptionofafixedTisonlyforconvenience.ThemoregeneralcaseinwhichTivariesacrossunitsisconsideredlater,intheexercises,andinGreene(1995a).6SeeTheorem3.3.\nGreene-50240bookJune18,200215:28288CHAPTER13✦ModelsforPanelDatay∗=MDy.ThestructureofDisparticularlyconvenient;itscolumnsareorthogonal,so0M00···00M00···0MD=.···000···M0Eachmatrixonthediagonalis01M=IT−ii.(13-5)TPremultiplyinganyT×1vectorzbyM0createsM0z=z−z¯i.(NotethatthemeanisiiitakenoveronlytheTobservationsforuniti.)Therefore,theleastsquaresregressionofMDyonMDXisequivalenttoaregressionof[yit−y¯i.]on[xit−x¯i.],wherey¯i.andx¯i.arethescalarandK×1vectorofmeansofyandxovertheTobservationsforgroupi.7ititThedummyvariablecoefficientscanberecoveredfromtheothernormalequationinthepartitionedregression:DDa+DXb=Dyora=[DD]−1D(y−Xb).Thisimpliesthatforeachi,a=y¯−bx¯.(13-6)ii.i.TheappropriateestimatoroftheasymptoticcovariancematrixforbisEst.Asy.Var[b]=s2[XMX]−1,(13-7)Dwhichusesthesecondmomentmatrixwithx’snowexpressedasdeviationsfromtheirrespectivegroupmeans.ThedisturbancevarianceestimatorisnT22i=1t=1(yit−xitb−ai)(y−MDXb)(y−MDXb)s==.(13-8)nT−n−K(nT−n−K)Theitthresidualusedinthiscomputationise=y−xb−a=y−xb−(y¯−x¯b)=(y−y¯)−(x−x¯)b.itititiititi.i.iti.iti.Thus,thenumeratorins2isexactlythesumofsquaredresidualsusingtheleastsquaresslopesandthedataingroupmeandeviationform.But,doneinthisfashion,onemightthenusenT−KinsteadofnT−n−Kforthedenominatorincomputings2,soacorrectionwouldbenecessary.Fortheindividualeffects,σ2Asy.Var[a]=+x¯Asy.Var[b]x¯,ii.i.Tsoasimpleestimatorbasedons2canbecomputed.7AninterestingspecialcasearisesifT=2.Inthetwo-periodcase,youcanshow—weleaveitasanexercise—thatthisleastsquaresregressionisdonewithnT/2firstdifferenceobservations,byregressingobservation(yi2−yi1)(anditsnegative)on(xi2−xi1)(anditsnegative).\nGreene-50240bookJune18,200215:28CHAPTER13✦ModelsforPanelData28913.3.1TESTINGTHESIGNIFICANCEOFTHEGROUPEFFECTSThetratioforaicanbeusedforatestofthehypothesisthatαiequalszero.Thishypothesisaboutonespecificgroup,however,istypicallynotusefulfortestinginthisregressioncontext.Ifweareinterestedindifferencesacrossgroups,thenwecantestthehypothesisthattheconstanttermsareallequalwithanFtest.Underthenullhypothesisofequality,theefficientestimatorispooledleastsquares.TheFratiousedforthistestisR2−R2(n−1)LSDVPooledF(n−1,nT−n−K)=,(13-9)1−R2(nT−n−K)LSDVwhereLSDVindicatesthedummyvariablemodelandPooledindicatesthepooledorrestrictedmodelwithonlyasingleoverallconstantterm.Alternatively,themodelmayhavebeenestimatedwithanoverallconstantandn−1dummyvariablesinstead.Allotherresults(i.e.,theleastsquaresslopes,s2,R2)willbeunchanged,butratherthanestimateαi,eachdummyvariablecoefficientwillnowbeanestimateofαi−α1wheregroup“1”istheomittedgroup.TheFtestthatthecoefficientsonthesen−1dummyvariablesarezeroisidenticaltotheoneabove.Itisimportanttokeepinmind,however,thatalthoughthestatisticalresultsarethesame,theinterpretationofthedummyvariablecoefficientsinthetwoformulationsisdifferent.813.3.2THEWITHIN-ANDBETWEEN-GROUPSESTIMATORSWecanformulateapooledregressionmodelinthreeways.First,theoriginalformula-tionisy=xβ+α+ε.(13-10a)itititIntermsofdeviationsfromthegroupmeans,y−y¯=(x−x¯)β+ε−ε¯,(13-10b)iti.iti.iti.whileintermsofthegroupmeans,y¯=x¯β+α+ε¯.(13-10c)i.i.i.Allthreeareclassicalregressionmodels,andinprinciple,allthreecouldbeestimated,atleastconsistentlyifnotefficiently,byordinaryleastsquares.[Notethat(13-10c)involvesonlynobservations,thegroupmeans.]Considerthenthematricesofsumsofsquaresandcrossproductsthatwouldbeusedineachcase,wherewefocusonlyonestimationofβ.In(13-10a),themomentswouldaccumulatevariationabouttheoverallmeans,y¯¯andx¯¯,andwewouldusethetotalsumsofsquaresandcrossproducts,nTnTStotal=(x−x¯¯)(x−x¯¯)andStotal=(x−x¯¯)(y−y¯¯).xxititxyititi=1t=1i=1t=1For(13-10b),sincethedataareindeviationsalready,themeansof(yit−y¯i.)and(xit−x¯i.)arezero.Themomentmatricesarewithin-groups(i.e.,variationaroundgroupmeans)8Foradiscussionofthedifferences,seeSuits(1984).\nGreene-50240bookJune18,200215:28290CHAPTER13✦ModelsforPanelDatasumsofsquaresandcrossproducts,nTnTSwithin=(x−x¯)(x−x¯)andSwithin=(x−x¯)(y−y¯).xxiti.iti.xyiti.iti.i=1t=1i=1t=1Finally,for(13-10c),themeanofgroupmeansistheoverallmean.Themomentmatricesarethebetween-groupssumsofsquaresandcrossproducts—thatis,thevariationofthegroupmeansaroundtheoverallmeans;nnSbetween=T(x¯−x¯¯)(x¯−x¯¯)andSbetween=T(x¯−x¯¯)(y¯−y¯¯).xxi.i.xyi.i.i=1i=1ItiseasytoverifythatStotal=Swithin+SbetweenandStotal=Swithin+Sbetween.xxxxxxxyxyxyTherefore,therearethreepossibleleastsquaresestimatorsofβcorrespondingtothedecomposition.Theleastsquaresestimatoristotaltotal−1totalwithinbetween−1withinbetweenb=SxxSxy=Sxx+SxxSxy+Sxy.(13-11)Thewithin-groupsestimatoriswithinwithin−1withinb=SxxSxy.(13-12)ThisistheLSDVestimatorcomputedearlier.[See(13-4).]Analternativeestimatorwouldbethebetween-groupsestimator,betweenbetween−1betweenb=SxxSxy(13-13)(sometimescalledthegroupmeansestimator).Thisleastsquaresestimatorof(13-10c)isbasedonthensetsofgroupsmeans.(NotethatweareassumingthatnisatleastaslargeasK.)Fromtheprecedingexpressions(andfamiliarpreviousresults),Swithin=SwithinbwithinandSbetween=Sbetweenbbetween.xyxxxyxxInsertingthesein(13-11),weseethattheleastsquaresestimatorisamatrixweightedaverageofthewithin-andbetween-groupsestimators:btotal=Fwithinbwithin+Fbetweenbbetween,(13-14)wherewithinwithinbetween−1withinbetweenF=Sxx+SxxSxx=I−F.TheformofthisresultresemblestheBayesianestimatorintheclassicalmodeldiscussedinSection16.2.Theresemblanceismorethanpassing;itcanbeshown[see,e.g.,Judge(1985)]thatwithinwithin−1between−1−1within−1F=[Asy.Var(b)]+[Asy.Var(b)][Asy.Var(b)],whichisessentiallythesamemixingresultwehavefortheBayesianestimator.Intheweightedaverage,theestimatorwiththesmallervariancereceivesthegreaterweight.\nGreene-50240bookJune18,200215:28CHAPTER13✦ModelsforPanelData29113.3.3FIXEDTIMEANDGROUPEFFECTSTheleastsquaresdummyvariableapproachcanbeextendedtoincludeatime-specificeffectaswell.Onewaytoformulatetheextendedmodelissimplytoaddthetimeeffect,asiny=xβ+α+γ+ε.(13-15)ititititThismodelisobtainedfromtheprecedingonebytheinclusionofanadditionalT−1dummyvariables.(Oneofthetimeeffectsmustbedroppedtoavoidperfectcollinearity—thegroupeffectsandtimeeffectsbothsumtoone.)Ifthenumberofvariablesistoolargetohandlebyordinaryregression,thenthismodelcanalsobeesti-matedbyusingthepartitionedregression.9Thereisanasymmetryinthisformulation,however,sinceeachofthegroupeffectsisagroup-specificintercept,whereasthetimeeffectsarecontrasts—thatis,comparisonstoabaseperiod(theonethatisexcluded).Asymmetricformofthemodelisy=xβ+µ+α+γ+ε,(13-15)ititititwhereafullnandTeffectsareincluded,buttherestrictionsαi=γt=0itareimposed.Leastsquaresestimatesoftheslopesinthismodelareobtainedbyregres-sionofy∗it=yit−y¯i.−y¯.t+y¯¯(13-16)onx∗it=xit−x¯i.−x¯.t+x¯¯,wheretheperiod-specificandoverallmeansare1n1nTy¯.t=yitandy¯¯=yit,nnTi=1i=1t=1andlikewiseforx¯.tandx¯¯.Theoverallconstantandthedummyvariablecoefficientscanthenberecoveredfromthenormalequationsasµˆ=m=y¯¯−x¯¯b,αˆ=a=(y¯−y¯¯)−(x¯−x¯¯)b,(13-17)iii.i.γˆ=c=(y¯−y¯¯)−(x¯−x¯¯)b.tt.t.t9Thematrixalgebraandthetheoreticaldevelopmentoftwo-wayeffectsinpaneldatamodelsarecomplex.See,forexample,Baltagi(1995).Fortunately,thepracticalapplicationismuchsimpler.Thenumberofperiodsanalyzedinmostpaneldatasetsisrarelymorethanahandful.Sincemoderncomputerprograms,eventhosewrittenstrictlyformicrocomputers,uniformlyallowdozens(orevenhundreds)ofregressors,almostanyapplicationinvolvingasecondfixedeffectcanbehandledjustbyliterallyincludingthesecondeffectasasetofactualdummyvariables.\nGreene-50240bookJune18,200215:28292CHAPTER13✦ModelsforPanelDataTheestimatedasymptoticcovariancematrixforbiscomputedusingthesumsofsquaresandcrossproductsofx∗itcomputedin(13-16)andnT22i=1t=1(yit−xitb−m−ai−ct)s=nT−(n−1)−(T−1)−K−1IfoneofnorTissmallandtheotherislarge,thenitmaybesimplerjusttotreatthesmallersetasanordinarysetofvariablesandapplythepreviousresultstotheone-wayfixedeffectsmodeldefinedbythelargerset.Althoughmoregeneral,thismodelisinfrequentlyusedinpractice.Therearetworeasons.First,thecostintermsofdegreesoffreedomisoftennotjustified.Second,inthoseinstancesinwhichamodelofthetimewiseevolutionofthedisturbanceisdesired,amoregeneralmodelthanthissimpledummyvariableformulationisusuallyused.Example13.2FixedEffectsRegressionsTable13.1containstheestimatedcostequationswithindividualfirmeffects,specificperiodeffects,andbothfirmandperiodeffects.Forcomparison,theleastsquaresandgroupmeansresultsaregivenalso.TheFstatisticfortestingthejointsignificanceofthefirmeffectsis(0.997434−0.98829)/5F[5,81]==57.614.(1−0.997431)/81ThecriticalvaluefromtheFtableis2.327,sotheevidenceisstronglyinfavorofafirmspecificeffectinthedata.Thesamecomputationforthetimeeffects,intheabsenceofthefirmeffectsproducesanF[14,72]statisticof1.170,whichisconsiderablylessthanthe95percentcriticalvalueof1.832.Thus,onthisbasis,theredoesnotappeartobeasignificantcostdifferenceacrossthedifferentperiodsthatisnotaccountedforbythefuelpricevariable,output,andloadfactors.Thereisadistinctivepatterntothetimeeffects,whichwewillexaminemorecloselylater.Inthepresenceofthefirmeffects,theF[14,67]ratioforthejointsignificanceoftheperiodeffectsis3.149,whichislargerthanthetablevalueof1.842.TABLE13.1CostEquationswithFixedFirmandPeriodEffectsParameterEstimatesSpecificationββββR2s21234Noeffects9.5170.882740.45398−1.62750.988290.015528(0.22924)(0.013255)(0.020304)(0.34530)Groupmeans85.8090.78246−5.5240−1.75100.993640.015838(56.483)(0.10877)(4.47879)(2.74319)Firmeffects0.919280.41749−1.070400.997430.003625(0.029890)(0.015199)(0.20169)a1...a6:9.7069.6659.4979.8919.7309.793Timeeffects0.86773−0.48448−1.954400.990460.016705(0.015408)(0.36411)(0.44238)c1...c820.49620.57820.65620.74121.20021.41121.50321.654c9...c1521.82922.11422.46522.65122.61622.55222.537Firmandtime12.6670.817250.16861−0.882810.998450.002727effects(2.0811)(0.031851)(0.16348)(0.26174)a1...a60.128330.06549−0.189470.13425−0.09265−0.04596c1...c8−0.37402−0.31932−0.27669−0.22304−0.15393−0.10809−0.07686−0.02073c9...c150.047220.091730.207310.285470.301380.300470.31911\nGreene-50240bookJune18,200215:28CHAPTER13✦ModelsforPanelData29313.3.4UNBALANCEDPANELSANDFIXEDEFFECTSMissingdataareverycommoninpaneldatasets.Forthisreason,orperhapsjustbecauseofthewaythedatawererecorded,panelsinwhichthegroupsizesdifferacrossgroupsarenotunusual.Thesepanelsarecalledunbalancedpanels.Theprecedinganalysisassumedequalgroupsizesandreliedontheassumptionatseveralpoints.Amodificationntoallowunequalgroupsizesisquitesimple.First,thefullsamplesizeisi=1TiinsteadofnT,whichcallsforminormodificationsinthecomputationsofs2,Var[b],Var[a],anditheFstatistic.Second,groupmeansmustbebasedonTi,whichvariesacrossgroups.TheoverallmeansfortheregressorsarenTinnx¯¯=i=1t=1xiti=1Tix¯i.n=n=fix¯i.,i=1Tii=1Tii=1nwherefi=Ti/(i=1Ti).Ifthegroupsizesareequal,thenfi=1/n.Thewithingroupsmomentmatrixshownin(13-4),Swithin=XMX,xxDisnnTXM0X=(x−x¯)(x−x¯).iiiiti.iti.i=1i=1t=1Theothermoments,SwithinandSwithin,arecomputedlikewise.NootherchangesarexyyynecessaryfortheonefactorLSDVestimator.Thetwo-waymodelcanbehandledlikewise,althoughwithunequalgroupsizesinbothdirections,thealgebrabecomesfairlycumbersome.Onceagain,however,thepracticeismuchsimplerthanthetheory.TheeasiestapproachforunbalancedpanelsisjusttocreatethefullsetofTdummyvariablesusingasTtheunionofthedatesrepresentedinthefulldataset.One(presumablythelast)isdropped,sowerevertbackto(13-15).Then,withineachgroup,anyoftheTperiodsrepresentedisaccountedforbyusingoneofthedummyvariables.LeastsquaresusingtheLSDVapproachforthegroupeffectswillthenautomaticallytakecareofthemessyaccountingdetails.13.4RANDOMEFFECTSThefixedeffectsmodelallowstheunobservedindividualeffectstobecorrelatedwiththeincludedvariables.Wethenmodeledthedifferencesbetweenunitsstrictlyasparametricshiftsoftheregressionfunction.Thismodelmightbeviewedasapplyingonlytothecross-sectionalunitsinthestudy,nottoadditionalonesoutsidethesample.Forexample,anintercountrycomparisonmaywellincludethefullsetofcountriesforwhichitisreasonabletoassumethatthemodelisconstant.Iftheindividualeffectsarestrictlyuncorrelatedwiththeregressors,thenitmightbeappropriatetomodeltheindividualspecificconstanttermsasrandomlydistributedacrosscross-sectionalunits.Thisviewwouldbeappropriateifwebelievedthatsampledcross-sectionalunitsweredrawnfromalargepopulation.Itwouldcertainlybethecaseforthelongitudinaldatasetslisted\nGreene-50240bookJune18,200215:28294CHAPTER13✦ModelsforPanelDataintheintroductiontothischapter.10Thepayofftothisformisthatitgreatlyreducesthenumberofparameterstobeestimated.Thecostisthepossibilityofinconsistentestimates,shouldtheassumptionturnouttobeinappropriate.Consider,then,areformulationofthemodely=xβ+(α+u)+ε,(13-18)ititiitwherethereareKregressorsincludingaconstantandnowthesingleconstanttermisthemeanoftheunobservedheterogeneity,E[zα].Thecomponentuistherandomiiheterogeneityspecifictotheithobservationandisconstantthroughtime;recallfromSection13.2,u=zα−E[zα].Forexample,inananalysisoffamilies,wecanviewiiiuasthecollectionoffactors,zα,notintheregressionthatarespecifictothatfamily.iiWeassumefurtherthatE[εit|X]=E[ui|X]=0,Eε2X=σ2,itεEu2X=σ2,iu(13-19)E[εituj|X]=0foralli,t,andj,E[εitεjs|X]=0ift=sori=j,E[uiuj|X]=0ifi=j.Asbefore,itisusefultoviewtheformulationofthemodelinblocksofTobservationsforgroupi,yi,Xi,uii,andεi.FortheseTobservations,letηit=εit+uiandη=[η,η,...,η].ii1i2iTInviewofthisformofηit,wehavewhatisoftencalledan“errorcomponentsmodel.”Forthismodel,Eη2X=σ2+σ2,itεuE[ηη|X]=σ2,t=sitisuE[ηitηjs|X]=0foralltandsifi=j.FortheTobservationsforuniti,let=E[ηη|X].Theniiσ2+σ2σ2σ2···σ2εuuuu22222=σuσε+σuσu···σu=σ2I+σ2ii,(13-20)εTuTT···σ2σ2σ2···σ2+σ2uuuεu10Thisdistinctionisnothardandfast;itispurelyheuristic.Weshallreturntothisissuelater.SeeMundlak(1978)formethodologicaldiscussionofthedistinctionbetweenfixedandrandomeffects.\nGreene-50240bookJune18,200215:28CHAPTER13✦ModelsforPanelData295whereiTisaT×1columnvectorof1s.Sinceobservationsiandjareindependent,thedisturbancecovariancematrixforthefullnTobservationsis00···000···0=..=In⊗.(13-21).000···13.4.1GENERALIZEDLEASTSQUARESThegeneralizedleastsquaresestimatoroftheslopeparametersis−1nnβˆ=(X−1X)−1X−1y=X−1XX−1yiiiii=1i=1TocomputethisestimatoraswedidinChapter10bytransformingthedataandusingordinaryleastsquareswiththetransformeddata,wewillrequire−1/2=[I⊗]−1/2.nWeneedonlyfind−1/2,whichis1θ−1/2=I−ii,TTσεTwhereσεθ=1−.σ2+Tσ2εuThetransformationofyiandXiforGLSisthereforeyı1−θy¯ı.−1/21yı2−θy¯ı.yi=.,(13-22)σε..yıT−θy¯ı.andlikewisefortherowsofX.11Forthedatasetasawhole,then,generalizedleastisquaresiscomputedbytheregressionofthesepartialdeviationsofyitonthesametransformationsofxit.NotethesimilarityofthisproceduretothecomputationintheLSDVmodel,whichusesθ=1.(Onecouldinterpretθastheeffectthatwouldremainifσεwerezero,becausetheonlyeffectwouldthenbeui.Inthiscase,thefixedandrandomeffectsmodelswouldbeindistinguishable,sothisresultmakessense.)ItcanbeshownthattheGLSestimatoris,liketheOLSestimator,amatrixweightedaverageofthewithin-andbetween-unitsestimators:βˆ=Fˆwithinbwithin+(I−Fˆwithin)bbetween,12(13-23)11ThistransformationisaspecialcaseofthemoregeneraltreatmentinNerlove(1971b).12Analternativeformofthisexpression,inwhichtheweighingmatricesareproportionaltothecovariancematricesofthetwoestimators,isgivenbyJudgeetal.(1985).\nGreene-50240bookJune18,200215:28296CHAPTER13✦ModelsforPanelDatawherenow,Fˆwithinwithinbetween−1within=Sxx+λSxxSxx,σ2λ=ε=(1−θ)2.σ2+Tσ2εuTotheextentthatλdiffersfromone,weseethattheinefficiencyofleastsquareswillfollowfromaninefficientweightingofthetwoestimators.Comparedwithgeneralizedleastsquares,ordinaryleastsquaresplacestoomuchweightonthebetween-unitsvari-ation.ItincludesitallinthevariationinX,ratherthanapportioningsomeofittorandomvariationacrossgroupsattributabletothevariationinuiacrossunits.Therearesomepolarcasestoconsider.Ifλequals1,thengeneralizedleastsquaresisidenticaltoordinaryleastsquares.Thissituationwouldoccurifσ2werezero,inwhichucaseaclassicalregressionmodelwouldapply.Ifλequalszero,thentheestimatoristhedummyvariableestimatorweusedinthefixedeffectssetting.Therearetwopossibilities.Ifσ2werezero,thenallvariationacrossunitswouldbeduetothedifferentus,which,εibecausetheyareconstantacrosstime,wouldbeequivalenttothedummyvariablesweusedinthefixed-effectsmodel.Thequestionofwhethertheywerefixedorrandomwouldthenbecomemoot.Theyaretheonlysourceofvariationacrossunitsoncetheregressionisaccountedfor.TheothercaseisT→∞.Wecanviewitthisway:IfT→∞,thentheunobserveduibecomesobservable.TaketheTobservationsfortheithunit.Ourestimatorof[α,β]isconsistentinthedimensionsTorn.Therefore,y−xβ−α=u+εititiitbecomesobservable.Theindividualmeanswillprovidey¯−x¯β−α=u+ε¯.i.i.iiButε¯i.convergestozero,whichrevealsuitous.Therefore,ifTgoestoinfinity,uibecomestheαidiweusedearlier.Unbalancedpanelsaddalayerofdifficultyintherandomeffectsmodel.Thefirstproblemcanbeseenin(13-21).ThematrixisnolongerI⊗becausethediagonalblocksinareofdifferentsizes.Thereisalsogroupwiseheteroscedasticity,becausetheithdiagonalblockin−1/2is−1/2θiσεi=ITi−iTiiTi,θi=1−22.Tiσε+TiσuInprinciple,estimationisstillstraightforward,sincethesourceofthegroupwisehet-eroscedasticityisonlytheunequalgroupsizes.Thus,forGLS,orFGLSwithestimatedvariancecomponents,itisnecessaryonlytousethegroupspecificθiinthetransforma-tionin(13-22).13.4.2FEASIBLEGENERALIZEDLEASTSQUARESWHENISUNKNOWNIfthevariancecomponentsareknown,generalizedleastsquarescanbecomputedasshownearlier.Ofcourse,thisisunlikely,soasusual,wemustfirstestimatethe\nGreene-50240bookJune18,200215:28CHAPTER13✦ModelsforPanelData297disturbancevariancesandthenuseanFGLSprocedure.Aheuristicapproachtoesti-mationofthevariancecomponentsisasfollows:y=xβ+α+ε+u(13-24)itititiandy¯=x¯β+α+ε¯+u.i.i.i.iTherefore,takingdeviationsfromthegroupmeansremovestheheterogeneity:y−y¯=[x−x¯]β+[ε−ε¯].(13-25)iti.iti.iti.SinceTE(ε−ε¯)2=(T−1)σ2,iti.εt=1ifβwereobserved,thenanunbiasedestimatorofσ2basedonTobservationsingroupεiwouldbeT22t=1(εit−ε¯i.)σˆε(i)=.(13-26)T−1Sinceβmustbeestimated—(13-25)impliesthattheLSDVestimatorisconsistent,indeed,unbiasedingeneral—wemakethedegreesoffreedomcorrectionandusetheLSDVresidualsinT22t=1(eit−e¯i.)se(i)=.(13-27)T−K−1Wehavensuchestimators,soweaveragethemtoobtainnn11T(e−e¯)2nT(e−e¯)2s¯2=s2(i)=t=1iti.=i=1t=1iti..(13-28)eennT−K−1nT−nK−ni=1i=1Thedegreesoffreedomcorrectionins¯2isexcessivebecauseitassumesthatαandeβarereestimatedforeachi.Theestimatedparametersarethenmeansy¯i·andtheKslopes.Therefore,weproposetheunbiasedestimator13nT222i=1t=1(eit−e¯i.)σˆε=sLSDV=.(13-29)nT−n−KThisisthevarianceestimatorintheLSDVmodelin(13-8),appropriatelycorrectedfordegreesoffreedom.Itremainstoestimateσ2.Returntotheoriginalmodelspecificationin(13-24).Inuspiteofthecorrelationacrossobservations,thisisaclassicalregressionmodelinwhichtheordinaryleastsquaresslopesandvarianceestimatorsarebothconsistentand,inmostcases,unbiased.Therefore,usingtheordinaryleastsquaresresidualsfromthe13AformalproofofthispropositionmaybefoundinMaddala(1971)orinJudgeetal.(1985,p.551).\nGreene-50240bookJune18,200215:28298CHAPTER13✦ModelsforPanelDatamodelwithonlyasingleoverallconstant,wehaveeeplims2=plim=σ2+σ2.(13-30)PooledεunT−K−1Thisprovidesthetwoestimatorsneededforthevariancecomponents;thesecondwouldbeσˆ2=s2−s2.ApossiblecomplicationisthatthissecondestimatorcouldbeuPooledLSDVnegative.But,recallthatforfeasiblegeneralizedleastsquares,wedonotneedanunbiasedestimatorofthevariance,onlyaconsistentone.Assuch,wemaydropthedegreesoffreedomcorrectionsin(13-29)and(13-30).Ifso,thenthetwovarianceestimatorsmustbenonnegative,sincethesumofsquaresintheLSDVmodelcannotbelargerthanthatinthesimpleregressionwithonlyoneconstantterm.Alternativeestimatorshavebeenproposed,allbasedonthisprincipleofusingtwodifferentsumsofsquaredresiduals.14Thereisaremainingcomplication.Ifthereareanyregressorsthatdonotvarywithinthegroups,theLSDVestimatorcannotbecomputed.Forexample,inamodeloffamilyincomeorlaborsupply,oneoftheregressorsmightbeadummyvariableforlocation,familystructure,orlivingarrangement.Anyofthesecouldbeperfectlycollinearwiththefixedeffectforthatfamily,whichwouldpreventcomputationoftheLSDVestimator.Inthiscase,itisstillpossibletoestimatetherandomeffectsvariancecomponents.Let[b,a]beanyconsistentestimatorof[β,α],suchastheordinaryleastsquaresestimator.Then,(13-30)providesaconsistentestimatorofm=σ2+σ2.Theeeεumeansquaredresidualsusingaregressionbasedonlyonthengroupmeansprovidesaconsistentestimatorofm=σ2+(σ2/T),sowecanuse∗∗uεTσˆ2=(m−m)εee∗∗T−1T1σˆ2=m−m=ωm+(1−ω)m,u∗∗ee∗∗eeT−1T−1whereω>1.Asbefore,thisestimatorcanproduceanegativeestimateofσ2that,onceuagain,callsthespecificationofthemodelintoquestion.[Note,finally,thattheresidualsin(13-29)and(13-30)couldbebasedonthesamecoefficientvector.]13.4.3TESTINGFORRANDOMEFFECTSBreuschandPagan(1980)havedevisedaLagrangemultipliertestfortherandomeffectsmodelbasedontheOLSresiduals.15ForH:σ2=0(orCorr[η,η]=0),0uitisH:σ2=0,1u14See,forexample,WallaceandHussain(1969),Maddala(1971),FullerandBattese(1974),andAmemiya(1971).15Wehavefocusedthusfarstrictlyongeneralizedleastsquaresandmomentsbasedconsistentestimationofthevariancecomponents.TheLMtestisbasedonmaximumlikelihoodestimation,instead.See,Maddala(1971)andBalestraandNerlove(1966,2003)forthisapproachtoestimation.\nGreene-50240bookJune18,200215:28CHAPTER13✦ModelsforPanelData299theteststatisticis22nT2nTi=1t=1eitnTn(Te¯)2i=1i.LM=nT2−1=nT2−1.(13-31)2(T−1)e2(T−1)ei=1t=1iti=1t=1itUnderthenullhypothesis,LMisdistributedaschi-squaredwithonedegreeoffreedom.Example13.3TestingforRandomEffectsTheleastsquaresestimatesforthecostequationweregiveninExample13.1.Thefirmspecificmeansoftheleastsquaresresidualsaree¯=[0.068869,−0.013878,−0.19422,0.15273,−0.021583,0.0080906]Thetotalsumofsquaredresidualsfortheleastsquaresregressionisee=1.33544,so2nTT2e¯e¯LM=−1=334.85.2(T−1)eeBasedontheleastsquaresresiduals,weobtainaLagrangemultiplierteststatisticof334.85,whichfarexceedsthe95percentcriticalvalueforchi-squaredwithonedegreeoffreedom,3.84.Atthispoint,weconcludethattheclassicalregressionmodelwithasingleconstanttermisinappropriateforthesedata.Theresultofthetestistorejectthenullhypothesisinfavoroftherandomeffectsmodel.But,itisbesttoreservejudgmentonthat,becausethereisanothercompetingspecificationthatmightinducethesesameresults,thefixedeffectsmodel.Wewillexaminethispossibilityinthesubsequentexamples.Withthevarianceestimatorsinhand,FGLScanbeusedtoestimatetheparame-tersofthemodel.AllourearlierresultsforFGLSestimatorsapplyhere.Itwouldalsobepossibletoobtainthemaximumlikelihoodestimator.16Thelikelihoodfunctioniscomplicated,butaswehaveseenrepeatedly,theMLEofβwillbeGLSbasedonthemaximumlikelihoodestimatorsofthevariancecomponents.ItcanbeshownthattheMLEsofσ2andσ2aretheunbiasedestimatorsshownearlier,withoutthedegreesofεufreedomcorrections.17ThismodelsatisfiestherequirementsfortheOberhofer–Kmenta(1974)algorithm—seeSection11.7.2—sowecouldalsousetheiteratedFGLSproce-duretoobtaintheMLEsifdesired.Theinitialconsistentestimatorscanbebasedonleastsquaresresiduals.Stillotherestimatorshavebeenproposed.Nonewillhavebet-terasymptoticpropertiesthantheMLEorFGLSestimators,buttheymayoutperformtheminafinitesample.18Example13.4RandomEffectsModelsTocomputetheFGLSestimator,werequireestimatesofthevariancecomponents.Theunbi-asedestimatorofσ2istheresidualvarianceestimatorinthewithin-units(LSDV)regression.εThus,0.29262222σˆ==0.0036126.ε90−916SeeHsiao(1986)andNerlove(2003).17SeeBerzeg(1979).18SeeMaddalaandMount(1973).\nGreene-50240bookJune18,200215:28300CHAPTER13✦ModelsforPanelDataUsingtheleastsquaresresidualsfromthepooledregressionwehave1.335442σ2+σ2==0.015528εu90−4soσ2=0.015528−0.0036126=0.0199158.uForpurposesofFGLS,1/20.0036126θˆ=1−=0.890032.15(0.0199158)TheFGLSestimatesforthisrandomeffectsmodelareshowninTable13.2,withthefixedeffectsestimates.Theestimatedwithin-groupsvarianceislargerthanthebetween-groupsvariancebyafactoroffive.Thus,bytheseestimates,over80percentofthedisturbancevariationisexplainedbyvariationwithinthegroups,withonlythesmallremainderexplainedbyvariationacrossgroups.NoneofthedesirablepropertiesoftheestimatorsintherandomeffectsmodelrelyonTgoingtoinfinity.19Indeed,Tislikelytobequitesmall.Themaximumlikelihoodestimatorofσ2isexactlyequaltoanaverageofnestimators,eachbasedontheTεobservationsforuniti.[See(13-28).]Eachcomponentinthisaverageis,inprinciple,consistent.Thatis,itsvarianceisoforder1/Torsmaller.SinceTissmall,thisvariancemayberelativelylarge.But,eachtermprovidessomeinformationabouttheparameter.Theaverageoverthencross-sectionalunitshasavarianceoforder1/(nT),whichwillgotozeroifnincreases,evenifweregardTasfixed.TheconclusiontodrawisthatnothinginthistreatmentreliesonTgrowinglarge.AlthoughitcanbeshownthatsomeconsistencyresultswillfollowforTincreasing,thetypicalpaneldatasetisbasedondatasetsforwhichitdoesnotmakesensetoassumethatTincreaseswithoutboundor,insomecases,atall.20Asageneralproposition,itisnecessarytotakesomecareindevisingestimatorswhosepropertieshingeonwhetherTislargeornot.Thewidelyusedconven-tionaloneswehavediscussedheredonot,butwehavenotexhaustedthepossibilities.TheLSDVmodeldoesrelyonTincreasingforconsistency.Toseethis,weusethepartitionedregression.Theslopesareb=[XMX]−1[XMy].DdSinceXisnT×K,aslongastheinvertedmomentmatrixconvergestoazeromatrix,bisconsistentaslongaseithernorTincreaseswithoutbound.Butthedummyvariablecoefficientsare1Ta=y¯−x¯b=(y−xb).ii.i.ititTt=1Wehavealreadyseenthatbisconsistent.Suppose,forthepresent,thatx¯i.=0.ThenVar[ai]=Var[yit]/T.Therefore,unlessT→∞,theestimatorsoftheunit-specificeffectsarenotconsistent.(Theyare,however,bestlinearunbiased.)ThisinconsistencyisworthbearinginmindwhenanalyzingdatasetsforwhichTisfixedandthereisnointention19SeeNickell(1981).20Inthisconnection,Chamberlain(1984)providedsomeinnovativetreatmentsofpaneldatathat,infact,takeTasgiveninthemodelandthatbaseconsistencyresultssolelyonnincreasing.SomeadditionalresultsfordynamicmodelsaregivenbyBhargavaandSargan(1983).\nGreene-50240bookJune18,200215:28CHAPTER13✦ModelsforPanelData301toreplicatethestudyandnologicalargumentthatwouldjustifytheclaimthatitcouldhavebeenreplicatedinprinciple.TherandomeffectsmodelwasdevelopedbyBalestraandNerlove(1966).Theirformulationincludedatime-specificcomponent,κt,aswellastheindividualeffect:yit=α+βxit+εit+ui+κt.Theextendedformulationisrathercomplicatedanalytically.InBalestraandNerlove’sstudy,itwasmadeevenmoresobythepresenceofalaggeddependentvariablethatcausesalltheproblemsdiscussedearlierinourdiscussionofautocorrelation.Afullsetofresultsforthisextendedmodel,includingamethodforhandlingthelaggeddependentvariable,hasbeendeveloped.21WewillturntothisinSection13.7.13.4.4HAUSMAN’SSPECIFICATIONTESTFORTHERANDOMEFFECTSMODELAtvariouspoints,wehavemadethedistinctionbetweenfixedandrandomeffectsmod-els.Aninevitablequestionis,Whichshouldbeused?Fromapurelypracticalstandpoint,thedummyvariableapproachiscostlyintermsofdegreesoffreedomlost.Ontheotherhand,thefixedeffectsapproachhasoneconsiderablevirtue.Thereislittlejustificationfortreatingtheindividualeffectsasuncorrelatedwiththeotherregressors,asisassumedintherandomeffectsmodel.Therandomeffectstreatment,therefore,maysufferfromtheinconsistencyduetothiscorrelationbetweentheincludedvariablesandtherandomeffect.22ThespecificationtestdevisedbyHausman(1978)23isusedtotestfororthogonalityoftherandomeffectsandtheregressors.Thetestisbasedontheideathatunderthehypothesisofnocorrelation,bothOLSintheLSDVmodelandGLSareconsistent,butOLSisinefficient,24whereasunderthealternative,OLSisconsistent,butGLSisnot.Therefore,underthenullhypothesis,thetwoestimatesshouldnotdiffersystematically,andatestcanbebasedonthedifference.Theotheressentialingredientforthetestisthecovariancematrixofthedifferencevector,[b−βˆ]:Var[b−βˆ]=Var[b]+Var[βˆ]−Cov[b,βˆ]−Cov[b,βˆ].(13-32)Hausman’sessentialresultisthatthecovarianceofanefficientestimatorwithitsdiffer-encefromaninefficientestimatoriszero,whichimpliesthatCov[(b−βˆ),βˆ]=Cov[b,βˆ]−Var[βˆ]=0orthatCov[b,βˆ]=Var[βˆ].Insertingthisresultin(13-32)producestherequiredcovariancematrixforthetest,Var[b−βˆ]=Var[b]−Var[βˆ]=.(13-33)21SeeBalestraandNerlove(1966),Fomby,Hill,andJohnson(1984),Judgeetal.(1985),Hsiao(1986),AndersonandHsiao(1982),Nerlove(1971a,2003),andBaltagi(1995).22SeeHausmanandTaylor(1981)andChamberlain(1978).23RelatedresultsaregivenbyBaltagi(1986).24ReferringtotheGLSmatrixweightedaveragegivenearlier,weseethattheefficientweightusesθ,whereasOLSsetsθ=1.\nGreene-50240bookJune18,200215:28302CHAPTER13✦ModelsforPanelDataThechi-squaredtestisbasedontheWaldcriterion:W=χ2[K−1]=[b−βˆ]ˆ−1[b−βˆ].(13-34)Forˆ,weusetheestimatedcovariancematricesoftheslopeestimatorintheLSDVmodelandtheestimatedcovariancematrixintherandomeffectsmodel,excludingtheconstantterm.Underthenullhypothesis,Whasalimitingchi-squareddistributionwithK−1degreesoffreedom.Example13.5HausmanTestTheHausmantestforthefixedandrandomeffectsregressionsisbasedonthepartsoftheco-efficientvectorsandtheasymptoticcovariancematricesthatcorrespondtotheslopesinthemodels,thatis,ignoringtheconstantterm(s).ThecoefficientestimatesaregiveninTable13.2.Thetwoestimatedasymptoticcovariancematricesare0.0008934−0.0003178−0.001884Est.Var[bFE]=−0.00031780.0002310−0.0007686−0.001884−0.00076860.04068TABLE13.2RandomandFixedEffectsEstimatesParameterEstimatesSpecificationββββR2s21234Noeffects9.5170.882740.45398−1.62750.988290.015528(0.22924)(0.013255)(0.020304)(0.34530)FirmeffectsFixedeffects0.919300.41749−1.07040.997430.0036125(0.029890)(0.015199)(0.20169)White(1)(0.019105)(0.013533)(0.21662)White(2)(0.027977)(0.013802)(0.20372)Fixedeffectswithautocorrelationρˆ=0.51620.929750.38567−1.220740.0019179(0.033927)(0.0167409)(0.20174)s2/(1−ρˆ2)=0.002807Randomeffects9.61060.904120.42390−1.0646σˆ2=0.0119158u(0.20277)(0.02462)(0.01375)(0.1993)σˆ2=0.00361262εRandomeffectswithautocorrelationρˆ=0.516210.1390.912690.39123−1.2074σˆ2=0.0268079u(0.2587)(0.027783)(0.016294)(0.19852)σˆ2=0.0037341εFirmandtimeFixedeffectseffects12.6670.817250.16861−0.882810.998450.0026727(2.0811)(0.031851)(0.16348)(0.26174)Randomeffects9.7990.843280.38760−0.92943σˆ2=0.0142291u(0.87910)(0.025839)(0.06845)(0.25721)σˆ2=0.0026395εσˆ2=0.0551958v\nGreene-50240bookJune18,200215:28CHAPTER13✦ModelsforPanelData303and0.0006059−0.0002089−0.001450Est.Var[bRE]=−0.00020890.00018897−0.002141.−0.001450−0.0021410.03973Theteststatisticis4.16.Thecriticalvaluefromthechi-squaredtablewiththreedegreesoffreedomis7.814,whichisfarlargerthanthetestvalue.Thehypothesisthattheindividualeffectsareuncorrelatedwiththeotherregressorsinthemodelcannotberejected.BasedontheLMtest,whichisdecisivethatthereareindividualeffects,andtheHausmantest,whichsuggeststhattheseeffectsareuncorrelatedwiththeothervariablesinthemodel,wewouldconcludethatofthetwoalternativeswehaveconsidered,therandomeffectsmodelisthebetterchoice.13.5INSTRUMENTALVARIABLESESTIMATIONOFTHERANDOMEFFECTSMODELRecalltheoriginalspecificationofthelinearmodelforpaneldatain(13-1)y=xβ+zα+ε.(13-35)ititiitTherandomeffectsmodelisbasedontheassumptionthattheunobservedpersonspe-cificeffects,zi,areuncorrelatedwiththeincludedvariables,xit.Thisassumptionisamajorshortcomingofthemodel.However,therandomeffectstreatmentdoesallowthemodeltocontainobservedtimeinvariantcharacteristics,suchasdemographicchar-acteristics,whilethefixedeffectsmodeldoesnot—ifpresent,theyaresimplyabsorbedintothefixedeffects.HausmanandTaylor’s(1981)estimatorfortherandomeffectsmodelsuggestsawaytoovercomethefirstofthesewhileaccommodatingthesecond.Theirmodelisoftheform:y=xβ+xβ+zα+zα+ε+uit1it12it21i12i2itiwhereβ=(β,β)andα=(α,α).Inthisformulation,allindividualeffectsdenoted1212zareobserved.Asbefore,unobservedindividualeffectsthatarecontainedinzαinii(13-35)arecontainedinthepersonspecificrandomterm,ui.HausmanandTaylordefinefoursetsofobservedvariablesinthemodel:x1itisK1variablesthataretimevaryinganduncorrelatedwithui,z1iisL1variablesthataretimeinvariantanduncorrelatedwithui,x2itisK2variablesthataretimevaryingandarecorrelatedwithui,z2iisL2variablesthataretimeinvariantandarecorrelatedwithui.TheassumptionsabouttherandomtermsinthemodelareE[ui]=E[ui|x1it,z1i]=0thoughE[ui|x2it,z2i]=0,Var[u|x,z,x,z]=σ2,i1it1i2it2iuCov[εit,ui|x1it,z1i,x2it,z2i]=0,Var[ε+u|x,z,x,z]=σ2=σ2+σ2,iti1it1i2it2iεuCorr[ε+u,ε+u|x,z,x,z]=ρ=σ2/σ2.itiisi1it1i2it2iu\nGreene-50240bookJune18,200215:28304CHAPTER13✦ModelsforPanelDataNotethecrucialassumptionthatonecandistinguishsetsofvariablesx1andz1thatareuncorrelatedwithuifromx2andz2whicharenot.Thelikelypresenceofx2andz2iswhatcomplicatesspecificationandestimationoftherandomeffectsmodelinthefirstplace.Byconstruction,anyOLSorGLSestimatorsofthismodelareinconsistentwhenthemodelcontainsvariablesthatarecorrelatedwiththerandomeffects.HausmanandTaylorhaveproposedaninstrumentalvariablesestimatorthatusesonlytheinformationwithinthemodel(i.e.,asalreadystated).Thestrategyforestimationisbasedonthefollowinglogic:First,bytakingdeviationsfromgroupmeans,wefindthaty−y¯=(x−x¯)β+(x−x¯)β+ε−ε¯,(13-36)iti.1it1i12it2i2itiwhichimpliesthatβcanbeconsistentlyestimatedbyleastsquares,inspiteofthecor-relationbetweenx2andu.Thisisthefamiliar,fixedeffects,leastsquaresdummyvari-ableestimator—thetransformationtodeviationsfromgroupmeansremovesfromthemodelthepartofthedisturbancethatiscorrelatedwithx2it.Now,intheoriginalmodel,HausmanandTaylorshowthatthegroupmeandeviationscanbeusedas(K1+K2)instrumentalvariablesforestimationof(β,α).Thatistheimplicationof(13-36).Sincez1isuncorrelatedwiththedisturbances,itcanlikewiseserveasasetofL1instrumentalvariables.ThatleavesanecessityforL2instrumentalvariables.Theauthorsshowthatthegroupmeansforx1canserveastheseremaininginstruments,andthemodelwillbeidentifiedsolongasK1isgreaterthanorequaltoL2.Foridentificationpurposes,then,K1mustbeatleastaslargeasL2.Asusual,feasibleGLSisbetterthanOLS,andavail-able.Likewise,FGLSisanimprovementoversimpleinstrumentalvariableestimationofthemodel,whichisconsistentbutinefficient.Theauthorsproposethefollowingsetofstepsforconsistentandefficientestimation:Step1.ObtaintheLSDV(fixedeffects)estimatorofβ=(β,β)basedonxandx.1212Theresidualvarianceestimatorfromthisstepisaconsistentestimatorofσ2.εStep2.Formthewithingroupsresiduals,eit,fromtheLSDVregressionatstep1.Stackthegroupmeansoftheseresidualsinafullsamplelengthdatavector.Thus,e∗=e¯,t=1,...,T,i=1,...,n.Thesegroupmeansareusedasthedependentvari-itii.ableinaninstrumentalvariableregressiononz1andz2withinstrumentalvariablesz1andx1.(NotetheidentificationrequirementthatK1,thenumberofvariablesinx1beatleastaslargeasL2,thenumberofvariablesinz2.)ThetimeinvariantvariablesareeachrepeatedTtimesinthedatamatricesinthisregression.Thisprovidesaconsistentestimatorofα.Step3.Theresidualvarianceintheregressioninstep2isaconsistentestimatorofσ∗2=σ2+σ2/T.Fromthisestimatorandtheestimatorofσ2instep1,wededuceanuεεestimatorofσ2=σ∗2−σ2/T.WethenformtheweightforfeasibleGLSinthismodeluεbyformingtheestimateofσ2εθ=.σ2+Tσ2εuStep4.Thefinalstepisaweightedinstrumentalvariableestimator.Letthefullsetofvariablesinthemodelbew=(x,x,z,z).it1it2it1i2i\nGreene-50240bookJune18,200215:28CHAPTER13✦ModelsforPanelData305CollectthesenTobservationsintherowsofdatamatrixW.ThetransformedvariablesforGLSare,asbeforewhenwefirstfittherandomeffectsmodel,w∗=w−(1−θ)ˆw¯andy∗=y−(1−θ)ˆy¯ititiititiwhereθˆdenotesthesampleestimateofθ.ThetransformeddataarecollectedintherowsdatamatrixW∗andincolumnvectory∗.Noteinthecaseofthetimeinvariantvariablesinwit,thegroupmeanistheoriginalvariable,andthetransformationjustmultipliesthevariablebyθˆ.Theinstrumentalvariablesarev=[(x−x¯),(x−x¯),zx¯].it1it1i2it2i1i1iThesearestackedintherowsofthenT×(K1+K2+L1+K1)matrixV.Noteforthethirdandfourthsetsofinstruments,thetimeinvariantvariablesandgroupmeansarerepeatedforeachmemberofthegroup.Theinstrumentalvariableestimatorwouldbe(βˆ,αˆ)=[(W∗V)(VV)−1(VW∗)]−1[(W∗V)(VV)−1(Vy∗)].25(13-37)IVTheinstrumentalvariableestimatorisconsistentifthedataarenotweighted,thatis,ifWratherthanW∗isusedinthecomputation.But,thisisinefficient,inthesamewaythatOLSisconsistentbutinefficientinestimationofthesimplerrandomeffectsmodel.Example13.6TheReturnstoSchoolingTheeconomicreturnstoschoolinghavebeenafrequenttopicofstudybyeconometricians.ThePSIDandNLSdatasetshaveprovidedarichsourceofpaneldataforthiseffort.Inwage(orlogwage)equations,itisclearthattheeconomicbenefitsofschoolingarecorrelatedwithlatent,unmeasuredcharacteristicsoftheindividualsuchasinnateability,intelligence,drive,orperseverance.Assuch,thereislittlequestionthatsimplerandomeffectsmodelsbasedonpaneldatawillsufferfromtheeffectsnotedearlier.Thefixedeffectsmodelistheobviousalternative,buttheserichdatasetscontainmanyusefulvariables,suchasrace,unionmembership,andmaritalstatus,whicharegenerallytimeinvariant.Worseyet,thevariablemostofinterest,yearsofschooling,isalsotimeinvariant.HausmanandTaylor(1981)proposedtheestimatordescribedhereasasolutiontotheseproblems.Theauthorsstudiedtheeffectofschoolingon(thelogof)wagesusingarandomsamplefromthePSIDof750menaged25–55,observedintwoyears,1968and1972.Thetwoyearswerechosensoastominimizetheeffectofserialcorrelationapartfromthepersistentunmeasuredindividualeffects.Thevariablesusedintheirmodelwereasfollows:Experience=age—yearsofschooling—5,Yearsofschooling,BadHealth=adummyvariableindicatinggeneralhealth,Race=adummyvariableindicatingnonwhite(70of750observations),Union=adummyvariableindicatingunionmembership,Unemployed=adummyvariableindicatingpreviousyear’sunemployment.Themodelalsoincludedaconstanttermandaperiodindicator.[Thecodingofthelatterisnotgiven,butanytwodistinctvalues,including0for1968and1for1972wouldproduceidenticalresults.(Why?)]Theprimaryfocusofthestudyisthecoefficientonschoolinginthelogwageequation.Sinceschoolingand,probably,ExperienceandUnemployedarecorrelatedwiththelatent25NotethattheFGLSrandomeffectsestimatorwouldbe(βˆ,αˆ)=[W∗W∗]−1W∗y∗.RE\nGreene-50240bookJune18,200215:28306CHAPTER13✦ModelsforPanelDataTABLE13.3EstimatedLogWageEquationsVariablesOLSGLS/RELSDVHT/IV-GLSHT/IV-GLSx1Experience0.01320.01330.02410.0217(0.0011)a(0.0017)(0.0042)(0.0031)Badhealth−0.0843−0.0300−0.0388−0.0278−0.0388(0.0412)(0.0363)(0.0460)(0.0307)(0.0348)Unemployed−0.0015−0.0402−0.0560−0.0559LastYear(0.0267)(0.0207)(0.0295)(0.0246)TimeNRbNRNRNRNRx2Experience0.0241(0.0045)Unemployed−0.0560(0.0279)z1Race−0.0853−0.0878−0.0278−0.0175(0.0328)(0.0518)(0.0752)(0.0764)Union0.04500.03740.12270.2240(0.0191)(0.0296)(0.0473)(0.2863)Schooling0.06690.0676(0.0033)(0.0052)ConstantNRNRNRNRNRz2Schooling0.12460.2169(0.0434)(0.0979)σε0.3210.1920.1600.1900.629ρ=σ2/(σ2+σ2)0.6320.6610.817uuεSpec.Test[3]20.22.240.00aEstimatedasymptoticstandarderrorsaregiveninparentheses.bNRindicatesthatthecoefficientestimatewasnotreportedinthestudy.effect,thereislikelytobeseriousbiasinconventionalestimatesofthisequation.Table13.3reportssomeoftheirreportedresults.TheOLSandrandomeffectsGLSresultsinthefirsttwocolumnsprovidethebenchmarkfortherestofthestudy.Theschoolingcoefficientisestimatedat0.067,avaluewhichtheauthorssuspectedwasfartoosmall.Aswesawearlier,eveninthepresenceofcorrelationbetweenmeasuredandlatenteffects,inthismodel,theLSDVestimatorprovidesaconsistentestimatorofthecoefficientsonthetimevaryingvariables.Therefore,wecanuseitintheHausmanspecificationtestforcorrelationbetweentheincludedvariablesandthelatentheterogeneity.ThecalculationsareshowninSection13.4.4,result(13-34).SincetherearethreevariablesremainingintheLSDVequation,thechi-squaredstatistichasthreedegreesoffreedom.Thereportedvalueof20.2isfarlargerthanthe95percentcriticalvalueof7.81,sotheresultssuggestthattherandomeffectsmodelismisspecified.HausmanandTaylorproceededtoreestimatethelogwageequationusingtheirproposedestimator.ThefourthandfifthsetsofresultsinTable13.3presenttheinstrumentalvariableestimates.Thespecificationtestgivenwiththefourthsetofresultssuggeststhattheproce-durehasproducedthedesiredresult.Thehypothesisofthemodifiedrandomeffectsmodelisnownotrejected;thechi-squaredvalueof2.24ismuchsmallerthanthecriticalvalue.Theschoolingvariableistreatedasendogenous(correlatedwithui)inbothcases.ThedifferencebetweenthetwoisthetreatmentofUnemployedandExperience.Inthepreferredequation,theyareincludedinz2ratherthanz1.Theendresultoftheexerciseis,again,thecoeffi-cientonschooling,whichhasrisenfrom0.0669intheworstspecification(OLS)to0.2169inthelastone,adifferenceofover200percent.Astheauthorsnote,atthesametime,themeasuredeffectofracenearlyvanishes.\nGreene-50240bookJune18,200215:28CHAPTER13✦ModelsforPanelData30713.6GMMESTIMATIONOFDYNAMICPANELDATAMODELSPaneldataarewellsuitedforexaminingdynamiceffects,asinthefirst-ordermodel,y=xβ+γy+α+εititi,t−1iit=wδ+α+ε,itiitwherethesetofrighthandsidevariables,witnowincludesthelaggeddependentvari-able,yi,t−1.Addingdynamicstoamodelinthisfashionisamajorchangeinthein-terpretationoftheequation.Withoutthelaggedvariable,the“independentvariables”representthefullsetofinformationthatproduceobservedoutcomeyit.Withthelaggedvariable,wenowhaveintheequation,theentirehistoryoftherighthandsidevariables,sothatanymeasuredinfluenceisconditionedonthishistory;inthiscase,anyimpactofxitrepresentstheeffectofnewinformation.Substantialcomplicationsariseines-timationofsuchamodel.Inboththefixedandrandomeffectssettings,thedifficultyisthatthelaggeddependentvariableiscorrelatedwiththedisturbance,evenifitisassumedthatεitisnotitselfautocorrelated.Forthemoment,considerthefixedeffectsmodelasanordinaryregressionwithalaggeddependentvariable.WeconsideredthiscaseinSection5.3.2asaregressionwithastochasticregressorthatisdependentacrossobservations.Inthatdynamicregressionmodel,theestimatorbasedonTobservationsisbiasedinfinitesamples,butitisconsistentinT.ThatconclusionwasthemainresultofSection5.3.2.Thefinitesamplebiasisoforder1/T.Thesameresultapplieshere,butthedifferenceisthatwhereasbeforeweobtainedourlargesampleresultsbyallowingTtogrowlarge,inthissetting,Tisassumedtobesmallandfixed,andlarge-sampleresultsareobtainedwithrespecttongrowinglarge,notT.Thefixedeffectsestimatorofδ=[β,γ]canbeviewedasanaverageofnsuchestimators.AssumefornowthatT≥K+1whereKisthenumberofvariablesinxit.Then,from(13-4),−1nnδˆ=WM0WWM0yiiiii=1i=1−1nn=WM0WWM0Wdiiiiii=1i=1n=Fidii=1wheretherowsoftheT×(K+1)matrixWarewandM0istheT×Tmatrixthatiitcreatesdeviationsfromgroupmeans[see(13-5)].Eachgroupspecificestimator,diisinconsistent,asitisbiasedinfinitesamplesanditsvariancedoesnotgotozeroasnincreases.Thismatrixweightedaverageofninconsistentestimatorswillalsobeinconsistent.(Thisanalysisisonlyheuristic.IfTT.Inthiscase,then×nmatrixhasrankTwhichislessthann,soitmustbesingular,andtheFGLSestimatorcannotbecomputed.Forexample,astudyof20countrieseachobservedfor10yearswouldbesuchacase.Thisresultisadeficiencyofthedataset,notthemodel.Thepopulationmatrix,ispositivedefinite.But,iftherearenotenoughobservations,thenthedatasetistooshorttoobtainapositivedefiniteestimateofthematrix.Theheteroscedasticitymodeldescribedinthenextsectioncanalwaysbecomputed,however.36See,forexample,Kmenta(1986,p.620).Elsewhere,forexample,inFomby,Hill,andJohnson(1984,p.327),Tisusedinstead.\nGreene-50240bookJune18,200215:28CHAPTER13✦ModelsforPanelData32313.9.3HETEROSCEDASTICITYANDTHECLASSICALMODELTwospecialcasesofthismodelareofinterest.ThegroupwiseheteroscedasticmodelofSection11.7.2resultsiftheoffdiagonaltermsinallequalzero.Then,theGLSestimator,aswesawearlier,is−1n1n1βˆ=[X−1X]−1[X−1y]=XXXy.2ii2iiσiσii=1i=1Ofcourse,thedisturbancevariances,σ2,areunknown,sothetwo-stepFGLSmethodinotedearlier,nowbasedonlyonthediagonalelementsofwouldbeused.Thesecondspecialcaseistheclassicalregressionmodel,whichaddsthefurtherrestrictionσ2=1σ2=···=σ2.Wewouldnowstackthedatainthepooledregressionmodelin2ny=Xβ+ε.Forthissimplemodel,theGLSestimatorreducestopooledordinaryleastsquares.BeckandKatz(1995)suggestedthatthestandarderrorsfortheOLSestimatesinthismodelshouldbecorrectedforthepossiblemisspecificationthatwouldariseifσijijwerecorrectlyspecifiedby(13-49)insteadofσ2I,asnowassumed.TheappropriateasymptoticcovariancematrixforOLSinthegeneralcaseis,asalways,Asy.Var[b]=(XX)−1XX(XX)−1.Forthespecialcaseofij=σijI,−1−1nnnnAsy.Var[b]=XXσXXXX.(13-56)iiijijiii=1i=1j=1i=1Thisestimatorisstraightforwardtocomputewithestimatesofσijinhand.SincetheOLSestimatorisconsistent,(13-54)maybeusedtoestimateσij.13.9.4SPECIFICATIONTESTSWeareinterestedintestingdownfromthegeneralmodeltothesimplerformsifpossible.Sincethemodelspecifiedthusfarisdistributionfree,thestandardapproaches,suchaslikelihoodratiotests,arenotavailable.Weproposethefollowingprocedure.Underthenullhypothesisofacommonvariance,σ2(i.e.,theclassicalmodel)theWaldstatisticfortestingthenullhypothesisagainstthealternativeofthegroupwiseheteroscedasticitymodelwouldben222σˆi−σW=.Varσˆ2i=1iIfthenullhypothesisiscorrect,d2W−→χ[n].Byhypothesis,plimσˆ2=σ2,\nGreene-50240bookJune18,200215:28324CHAPTER13✦ModelsforPanelDatawhereσˆ2isthedisturbancevarianceestimatorfromthepooledOLSregression.WemustnowconsiderVar[σˆ2].Sincei1Tσˆ2=e2,iitTt=1isameanofTobservations,wemayestimateVar[σˆ2]withi11T22237fii=eit−σˆi.(13-57)TT−1t=1ThemodifiedWaldstatisticisthenn222σˆi−σˆW=.fiii=1ALagrangemultiplierstatisticisalsosimpletocomputeandasymptoticallyequiv-alenttoalikelihoodratiotest—weconsiderthesebelow.But,theseassumenormal-ity,whichwehavenotyetinvoked.Tothispoint,ourspecificationisdistributionfree.White’sgeneraltest38isanalternative.TouseWhite’stest,wewouldregressthesquaredOLSresidualsonthePuniquevariablesinxandthesquaresandcrossproducts,in-cludingaconstant.Thechi-squaredstatistic,whichhasP−1degreesoffreedom,is(nT)R2.Forthefullmodelwithnonzerooffdiagonalelementsin,theprecedingapproachmustbemodified.Onemightconsidersimplyaddingthecorrespondingtermsfortheoffdiagonalelements,withacommonσij=0,butthisneglectsthefactthatunderthisbroaderalternativehypothesis,theoriginalnvarianceestimatorsarenolongeruncorrelated,evenasymptotically,sothelimitingdistributionoftheWaldstatisticisnolongerchi-squared.Alternativeapproachesthathavebeensuggested[see,e.g.,JohnsonandWichern(1999,p.424)]arebasedonthefollowinggeneralstrategy:Underthealternativehypothesisofanunrestricted,thesampleestimateofwillbeˆ=[σˆij]asdefinedin(13-54).Underanyrestrictivenullhypothesis,theestimatorofwillbeˆ0,amatrixthatbyconstructionwillbelargerthanˆinthematrixsensedefinedinAppendixA.Statisticsbasedonthe“excessvariation,”suchasT(ˆ0−ˆ)aresuggestedforthetestingprocedure.OneoftheseisthelikelihoodratiotestthatwewillconsiderinSection13.9.6.13.9.5AUTOCORRELATIONTheprecedingdiscussiondealtwithheteroscedasticityandcross-sectionalcorrelation.Throughasimplemodificationoftheprocedures,itispossibletorelaxtheassumptionofnonautocorrelationaswell.ItissimplesttobeginwiththeassumptionthatCorr[εit,εjs]=0,ifi=j.37Notethatwouldapplystrictlyifwehadobservedthetruedisturbances,εit.Weareusingtheresidualsasestimatesoftheirpopulationcounterparts.Sincethecoefficientvectorisconsistent,thisprocedurewillobtainthedesiredresults.38SeeSection11.4.1.\nGreene-50240bookJune18,200215:28CHAPTER13✦ModelsforPanelData325Thatis,thedisturbancesbetweencross-sectionalunitsareuncorrelated.Now,wecantaketheapproachofChapter12toallowforautocorrelationwithinthecross-sectionalunits.Thatis,εit=ρiεi,t−1+uit,σ2(13-58)Var[ε]=σ2=ui.iti21−ρiForFGLSestimationofthemodel,supposethatriisaconsistentestimatorofρi.Then,ifwetakeeachtimeseries[yi,Xi]separately,wecantransformthedatausingthePrais–Winstentransformation:$$1−r2y1−r2xii1ii1yi2−riyi1xi2−rixi1y∗i=yi3−riyi2,X∗i=xi3−rixi2.(13-59)......yiT−riyi,T−1xiT−rixi,T−1Intermsofthetransformeddatay∗iandX∗i,themodelisnowonlyheteroscedastic;thetransformationhasremovedtheautocorrelation.Assuch,thegroupwiseheteroscedas-ticmodelappliestothetransformeddata.Wemaynowuseweightedleastsquares,asdescribedearlier.Thisrequiresasecondleastsquaresestimate.Thefirst,OLSregres-sionproducesinitialestimatesofρi.Thetransformeddataarethenusedinasecondleastsquaresregressiontoobtainconsistentestimators,ee(y−Xβˆ)(y−Xβˆ)2∗i∗i∗i∗i∗i∗iσˆui==.(13-60)TT[NotethatboththeinitialOLSandthesecondroundFGLSestimatorsofβareconsis-tent,soeithercouldbeusedin(13-60).Wehaveusedβˆtodenotethecoefficientvectorused,whicheveroneischosen.]Withtheseresultsinhand,wemayproceedtothecal-culationofthegroupwiseheteroscedasticregressioninSection13.9.3.Attheendofthecalculation,themomentmatrixusedinthelastregressiongivesthecorrectasymptoticˆˆcovariancematrixfortheestimator,nowβ.Ifdesired,thenaconsistentestimatorofσ2isεiσˆ2σˆ2=ui.(13-61)εi1−r2iTheremainingquestionishowtoobtaintheinitialestimatesri.Therearetwopossiblestructurestoconsider.Ifeachgroupisassumedtohaveitsownautocorrelationcoefficient,thenthechoicesarethesameonesexaminedinChapter12;thenaturalchoicewouldbeTt=2eitei,t−1ri=T2.t=1eitIfthedisturbanceshaveacommonstochasticprocesswiththesameρi,thenseveralestimatorsofthecommonρareavailable.Onewhichisanalogoustothatusedinthe\nGreene-50240bookJune18,200215:28326CHAPTER13✦ModelsforPanelDatasingleequationcaseisnTi=1t=2eitei,t−1r=(13-62)nTe2i=1t=1itAnotherconsistentestimatorwouldbesampleaverageofthegroupspecificestimatedautocorrelationcoefficients.Finally,onemaywishtoallowforcross-sectionalcorrelationacrossunits.Thepre-cedinghasanaturalgeneralization.IfweassumethatCov[uit,ujt]=σuij,thenweobtaintheoriginalmodelin(13-49)inwhichtheoff-diagonalblocksof,are1ρρ2···ρT−1jjjT−2ρi1ρj···ρjρ2ρ1···ρT−3σiijσ=uij..(13-63)ijij1−ρρ.ij....T−1T−2T−3ρiρiρi···1Initialestimatesofρiarerequired,asbefore.ThePrais–Winstentransformationrendersalltheblocksindiagonal.Therefore,themodelofcross-sectionalcorrelationinSection13.9.2appliestothetransformeddata.Onceagain,theGLSmomentmatrixˆˆobtainedatthelaststepprovidestheasymptoticcovariancematrixforβ.Estimatesofσεijcanbeobtainedfromtheleastsquaresresidualcovariancesobtainedfromthetransformeddata:σˆuijσˆεij=,(13-64)1−rirjwhereσˆ=ee/T.uij∗i∗j13.9.6MAXIMUMLIKELIHOODESTIMATIONConsiderthegeneralmodelwithgroupwiseheteroscedasticityandcrossgroupcorrela-tion.Thecovariancematrixisthein(13-49).Wenowassumethatthendisturbancesattimet,εthaveamultivariatenormaldistributionwithzeromeanandthisn×nco-variancematrix.TakinglogsandsummingovertheTperiodsgivesthelog-likelihoodforthesample,nTT1TlnL(β,|data)=−ln2π−ln||−ε−1ε,(13-65)tt222t=1ε=y−xβ,i=1,...,n.ititit(Thislog-likelihoodisanalyzedatlengthinSection14.2.4,sowedeferthemorede-tailedanalysisuntilthen.)Theresultisthatthemaximumlikelihoodestimatorofβisthegeneralizedleastsquaresestimatorin(13-53).Sincetheelementsofmustbeestimated,theFGLSestimatorin(13-54)isused,basedontheMLEof.Asshownin\nGreene-50240bookJune18,200215:28CHAPTER13✦ModelsforPanelData327Section14.2.4,themaximumlikelihoodestimatorofisˆˆˆˆyi−XiβMLyj−XjβMLεˆiεˆjσˆij==(13-66)TTbasedontheMLEofβ.SinceeachMLErequirestheother,howcanweproceedtoobtainboth?TheanswerisprovidedbyOberhoferandKmenta(1974)whoshowthatforcertainmodels,includingthisone,onecaniteratebackandforthbetweenthetwoestimators.(ThisisthesameestimatorweusedinSection11.7.2.)Thus,theMLEsareobtainedbyiteratingtoconvergencebetween(13-66)andβˆˆ=[Xˆ−1X]−1[Xˆ−1y].Theprocessmaybeginwiththe(consistent)ordinaryleastsquaresestimator,then(13-66),andsoon.Thecomputationsaresimple,usingbasicmatrixalgebra.HypothesistestsaboutβmaybedoneusingthefamiliarWaldstatistic.Theappropriateestimatoroftheasymptoticcovariancematrixistheinversematrixinbracketsin(13-55).Fortestingthehypothesisthattheoff-diagonalelementsofarezero—thatis,thatthereisnocorrelationacrossfirms—therearethreeapproaches.Thelikelihoodratiotestisbasedonthestatisticnλ=T(ln|ˆ|−ln|ˆ|)=Tlnσˆ2−ln|ˆ|,(13-67)LRheteroscedasticgeneralii=1whereσˆ2aretheestimatesofσ2obtainedfromthemaximumlikelihoodestimatesofiithegroupwiseheteroscedasticmodelandˆisthemaximumlikelihoodestimatorintheunrestrictedmodel.(Notehowtheexcessvariationproducedbytherestrictivemodelisusedtoconstructthetest.)Thelarge-sampledistributionofthestatisticischi-squaredwithn(n−1)/2degreesoffreedom.TheLagrangemultipliertestdevelopedbyBreuschandPagan(1980)providesanalternative.Thegeneralformofthestatisticisni−1λ=Tr2,(13-68)LMiji=2j=1wherer2istheijthresidualcorrelationcoefficient.Ifeveryindividualhadadifferentijparametervector,thenindividualspecificordinaryleastsquareswouldbeefficient(andML)andwewouldcomputerijfromtheOLSresiduals(assumingthattherearesufficientobservationsforthecomputation).Here,however,weareassumingonlyasingle-parametervector.Therefore,theappropriatebasisforcomputingthecorrelationsistheresidualsfromtheiteratedestimatorinthegroupwiseheteroscedasticmodel,thatis,thesameresidualsusedtocomputeσˆ2.(AnasymptoticallyvalidapproximationtoithetestcanbebasedontheFGLSresidualsinstead.)Notethatthisisnotaprocedurefortestingallthewaydowntotheclassical,homoscedasticregressionmodel.Thatcase,whichinvolvesdifferentLMandLRstatistics,isdiscussednext.IfeithertheLRstatisticin(13-67)ortheLMstatisticin(13-68)aresmallerthanthecriticalvaluefromthetable,theconclusion,basedonthistest,isthattheappropriatemodelisthegroupwiseheteroscedasticmodel.Forthegroupwiseheteroscedasticitymodel,MLestimationreducestogroupwiseweightedleastsquares.ThemaximumlikelihoodestimatorofβisfeasibleGLS.Themaximumlikelihoodestimatorofthegroupspecificvariancesisgivenbythediagonal\nGreene-50240bookJune18,200215:28328CHAPTER13✦ModelsforPanelDataelementin(13-66),whilethecrossgroupcovariancesarenowzero.Anadditionalusefulresultisprovidedbythenegativeoftheexpectedsecondderivativesmatrixofthelog-likelihoodin(13-65)withdiagonal,n%&1XX02ii2i=1σi−E[H(β,σi,i=1,...,n)]=%&.T0diag,i=1,...,n2σ4iSincetheexpectedHessianisblockdiagonal,thecompletesetofmaximumlikelihoodestimatescanbecomputedbyiteratingbackandforthbetweentheseestimatorsforσ2iandthefeasibleGLSestimatorofβ.(ThisprocessisalsoequivalenttousingasetofngroupdummyvariablesinHarvey’smodelofheteroscedasticityinSection11.7.1.)Fortestingtheheteroscedasticityassumptionofthemodel,thefullsetofteststrate-giesthatwehaveusedbeforeisavailable.TheLagrangemultipliertestisprobablythemostconvenienttest,sinceitdoesnotrequireanotherregressionafterthepooledleastsquaresregression.Itisconvenienttorewrite∂logLTσˆ2i=−1,∂σ22σ2σ2iiiwhereσˆ2istheithunit-specificestimateofσ2basedonthetrue(butunobserved)dis-iiturbances.Underthenullhypothesisofequalvariances,regardlessofwhatthecommonrestrictedestimatorofσ2is,thefirst-orderconditionforequating∂lnL/∂βtozerowillibetheOLSnormalequations,sotherestrictedestimatorofβisbusingthepooleddata.Toobtaintherestrictedestimatorofσ2,returntothelog-likelihoodfunction.Undertheinullhypothesisσ2=σ2,i=1,...,n,thefirstderivativeofthelog-likelihoodfunctioniwithrespecttothiscommonσ2is∂logLnT1nR∂σ2=−2σ2+2σ4εiεi.i=1Equatingthisderivativetozeroproducestherestrictedmaximumlikelihoodestimator1n1nσˆ2=εε.=σˆ2,iiinTni=1i=1whichisthesimpleaverageofthenindividualconsistentestimators.Usingtheleastsquaresresidualsattherestrictedsolution,weobtainσˆ2=(1/nT)eeandσˆ2=i(1/T)ee.WiththeseresultsinhandandusingtheestimateoftheexpectedHessianiiforthecovariancematrix,theLagrangemultiplierstatisticreduceston%2&2%4&n22Tσˆi2σˆTσˆiλLM=−1=−1.2σˆ2σˆ2T2σˆ2i=1i=1Thestatistichasn−1degreesoffreedom.(Ithasonlyn−1sincetherestrictionisthatthevariancesareallequaltoeachother,notaspecificvalue,whichisn−1restrictions.)Withtheunrestrictedestimates,asanalternativetestprocedure,wemayusetheWaldstatistic.Ifweassumenormality,thentheasymptoticvarianceofeachvariance\nGreene-50240bookJune18,200215:28CHAPTER13✦ModelsforPanelData329estimatoris2σ4/Tandthevariancesareasymptoticallyuncorrelated.Therefore,theiWaldstatistictotestthehypothesisofacommonvarianceσ2,usingσˆ2toestimateσ2,isiin%4&−1n%2&22222σiTσW=σˆi−σ=2−1.T2σˆii=1i=1NotethesimilaritytotheLagrangemultiplierstatistic.Theestimatorofthecommonvariancewouldbethepooledestimatorfromthefirstleastsquaresregression.Recall,weproducedageneralcounterpartforthisstatisticforthecaseinwhichdisturbancesarenotnormallydistributed.WecanalsocarryoutalikelihoodratiotestusingtheteststatisticinSection12.3.4.Theappropriatelikelihoodratiostatisticisnλ=T(ln|ˆ|−ln|ˆ|)=(nT)lnσˆ2−Tlnσˆ2,LRhomoscedasticheteroscedasticii=1whereeeεˆεˆ22iiσˆ=andσˆi=,nTTwithallresidualscomputedusingthemaximumlikelihoodestimators.Thischi-squaredstatistichasn−1degreesoffreedom.13.9.7APPLICATIONTOGRUNFELD’SINVESTMENTDATAToillustratethetechniquesdevelopedinthissection,wewilluseapanelofdatathathasforseveraldecadesprovidedausefultoolforexaminingmultipleequationestima-tors.AppendixTableF13.1listspartofthedatausedinaclassicstudyofinvestmentdemand.39Thedataconsistoftimeseriesof20yearlyobservationsforfivefirms(of10intheoriginalstudy)andthreevariables:Iit=grossinvestment,Fit=marketvalueofthefirmattheendofthepreviousyear,Cit=valueofthestockofplantandequipmentattheendofthepreviousyear.Allfiguresareinmillionsofdollars.ThevariablesFitandIitreflectanticipatedprofitandtheexpectedamountofreplacementinvestmentrequired.40ThemodeltobeestimatedwiththesedataisI=β+βF+βC+ε,41it12it3itit39SeeGrunfeld(1958)andGrunfeldandGriliches(1960).ThedatawerealsousedinBootanddeWitt(1960).Althoughadmittedlynotcurrent,thesedataareunusuallycooperativeforillustratingthedifferentaspectsofestimatingsystemsofregressionequations.40Intheoriginalstudy,theauthorsusedthenotationFt−1andCt−1.Toavoidpossibleconflictswiththeusualsubscriptingconventionsusedhere,wehaveusedtheprecedingnotationinstead.41Notethatwearemodelinginvestment,aflow,asafunctionoftwostocks.Thiscouldbeatheoreticalmisspecification—itmightbepreferabletospecifythemodelintermsofplannedinvestment.But,40yearsafterthefact,we’lltakethespecifiedmodelasitis.\nGreene-50240bookJune18,200215:28330CHAPTER13✦ModelsforPanelDataTABLE13.4EstimatedParametersandEstimatedStandardErrorsβ1β2β3HomoscedasticityLeastsquares−48.02970.105090.30537R2=0.77886,σˆ2=15708.84,log-likelihood=−624.9928OLSstandarderrors(21.16)(0.01121)(0.04285)Whitecorrection(15.017)(0.00915)(0.05911)BeckandKatz(10.814)(0.00832)(0.033043)HeteroscedasticFeasibleGLS−36.25370.094990.33781(6.1244)(0.00741)(0.03023)Maximumlikelihood−23.25820.094350.33371(4.815)(0.00628)(0.2204)Pooledσˆ2=15,853.08,log-likelihood=−564.535Cross-sectioncorrelationFeasibleGLS−28.2470.0891010.33401(4.888)(0.005072)(0.01671)Maximumlikelihood−2.2170.023610.17095(1.96)(0.004291)(0.01525)log-likelihood=−515.422AutocorrelationmodelHeteroscedastic−23.8110.0860510.33215(7.694)(0.009599)(0.03549)Cross-sectioncorrelation−15.4240.075220.33807(4.595)(0.005710)(0.01421)whereiindexesfirmsandtindexesyears.Differentrestrictionsontheparametersandthevariancesandcovariancesofthedisturbanceswillimplydifferentformsofthemodel.Bypoolingall100observationsandestimatingthecoefficientsbyordinaryleastsquares,weobtainthefirstsetofresultsinTable13.4.Tomaketheresultscom-parableallvarianceestimatesandestimatedstandarderrorsarebasedonee/(nT).Thereisnodegreesoffreedomcorrection.ThesecondsetofstandarderrorsgivenareWhite’srobustestimator[see(10-14)and(10-23)].ThethirdsetofstandarderrorsgivenabovearetherobuststandarderrorsbasedonBeckandKatz(1995)using(13-56)and(13-54).Theestimatesofσ2forthemodelofgroupwiseheteroscedasticityareshowniniTable13.5.Theestimatessuggestthatthedisturbancevariancedifferswidelyacrossfirms.Toinvestigatethispropositionbeforefittinganextendedmodel,wecanusethetestsforhomoscedasticitysuggestedearlier.BasedontheOLSresults,theLMstatisticequals46.63.Thecriticalvaluefromthechi-squareddistributionwithfourdegreesoffreedomis9.49,soonthebasisoftheLMtest,werejectthenullhypothesisofhomoscedasticity.TocomputeWhite’steststatistic,weregressthesquaredleastsquaresresidualsonaconstant,F,C,F2,C2,andFC.TheR2inthisregressionis0.36854,sothechi-squaredstatisticis(nT)R2=36.854withfivedegreesoffreedom.Thefivepercentcriticalvaluefromthetableforthechi-squaredstatisticwithfivedegreesoffreedomis11.07,sothenullhypothesisisrejectedagain.Thelikelihoodratiostatistic,basedon\nGreene-50240bookJune18,200215:28CHAPTER13✦ModelsforPanelData331TABLE13.5EstimatedGroupSpecificVariancesσ2σ2σ2σ2σ2GMCHGEWEUSBasedonOLS9,410.91755.8534,288.49633.4233,455.51HeteroscedasticFGLS8,612.14409.1936,563.24777.9732,902.83(2897.08)(136.704)(5801.17)(323.357)(7000.857)HeteroscedasticML8,657.72175.8040,210.961,240.0329,825.21CrossCorrelationFGLS10050.52305.6134556.6833.3634468.98Autocorrelation,s2(u)6525.7253.10414,620.8232.768,683.9uiiAutocorrelation,s2(e)8453.6270.15016,073.2349.6812,994.2eiitheMLresultsinTable13.4,isnχ2=100lns2−20lnσˆ2=120.915.ii=1Thisresultfarexceedsthetabledcriticalvalue.TheLagrangemultiplierstatisticbasedonallvariancescomputedusingtheOLSresidualsis46.629.TheWaldstatisticbasedontheFGLSestimatedvariancesandthepooledOLSestimate(15,708.84)is17,676.25.WeobservethecommonoccurrenceofanextremelylargeWaldteststatistic.(IfthetestisbasedonthesumofsquaredFGLSresiduals,σˆ2=15,853.08,thenW=18,012.86,whichleadstothesameconclusion.)TocomputethemodifiedWaldstatisticabsenttheassumptionofnormality,werequiretheestimatesofthevariancesoftheFGLSresidualvariances.ThesquarerootsoffiiareshowninTable13.5inparenthesesaftertheFGLSresidualvariances.ThemodifiedWaldstatisticisW=14,681.3,whichisconsistentwiththeotherresults.Weproceedtoreestimatetheregressionallowingforheteroscedastic-ity.TheFGLSandmaximumlikelihoodestimatesareshowninTable13.4.(ThelatterareobtainedbyiteratedFGLS.)Returningtotheleastsquaresestimator,weshouldexpecttheOLSstandarder-rorstobeincorrect,givenourfindings.Therearetwopossiblecorrectionswecanuse,theWhiteestimatoranddirectcomputationoftheappropriateasymptoticcovariancematrix.TheBecketal.estimatorisathirdcandidate,butitneglectstousetheknownre-strictionthattheoff-diagonalelementsinarezero.ThevariousestimatesshownatthetopofTable13.5dosuggestthattheOLSestimatedstandarderrorshavebeendistorted.Thecorrelationmatrixforthevarioussetsofresiduals,usingtheestimatesinTable13.4,isgiveninTable13.6.42Theseveralquitelargevaluessuggeststhatthemoregeneralmodelwillbeappropriate.Thetwoteststatisticsfortestingthenullhypothesisofadiagonal,basedonthelog-likelihoodvaluesinTable13.4,areλLR=−2(−565.535−(−515.422))=100.226and,basedontheMLE’sforthegroupwiseheteroscedasticitymodel,λLM=66.067(theMLEofbasedonthecoefficientsfromtheheteroscedasticmodelisnotshown).For10degreesoffreedom,thecriticalvaluefromthechi-squaredtableis23.21,sobothresultsleadtorejectionofthenullhypothesisofadiagonal.Weconcludethat42TheestimatesbasedontheMLEsaresomewhatdifferent,buttheresultsofallthehypothesistestsarethesame.\nGreene-50240bookJune18,200215:28332CHAPTER13✦ModelsforPanelDataTABLE13.6EstimatedCross-GroupCorrelationsBasedonFGLSEstimates(OrderisOLS,FGLSheteroscedastic,FGLScorrelation,Autocorrelation)EstimatedandCorrelationsGMCHGEWEUSGM1CH−0.344−0.1851−0.349−0.225GE−0.1820.283−0.1850.1441−0.2480.158−0.2870.105WE−0.3520.3430.890−0.4690.1860.8811−0.3560.2460.895−0.4670.1660.885US−0.1210.167−0.151−0.085−0.0160.222−0.122−0.1191−0.7160.244−0.176−0.040−0.0150.245−0.139−0.101thesimpleheteroscedasticmodelisnotgeneralenoughforthesedata.Ifthenullhypothesisisthatthedisturbancesarebothhomoscedasticanduncor-relatedacrossgroups,thenthesetwotestsareinappropriate.AlikelihoodratiotestcanbeconstructedusingtheOLSresultsandtheMLEsfromthefullmodel;theteststatisticwouldbeλ=(nT)ln(ee/nT)−Tln|ˆ|.LRThisstatisticisjustthesumoftheLRstatisticsforthetestofhomoscedasticityandthestatisticgivenabove.Forthesedata,thissumwouldbe120.915+100.226=221.141,whichisfarlargerthanthecriticalvalue,asmightbeexpected.FGLSandmaximumlikelihoodestimatesforthemodelwithcross-sectionalcorre-lationaregiveninTable13.4.Theestimateddisturbancevarianceshavechangeddra-matically,dueinparttothequitelargeoff-diagonalelements.Itisnoteworthy,however,thatdespitethelargechangesinˆ,withtheexceptionsoftheMLE’sinthecrosssectioncorrelationmodel,theparameterestimateshavenotchangedverymuch.(Thissampleismoderatelylargeandallestimatorsareconsistent,sothisresultistobeexpected.)WeshallexaminetheeffectofassumingthatallfivefirmshavethesameslopeparametersinSection14.2.3.Fornow,wenotethatoneoftheeffectsistoinflatethedisturbancecorrelations.WhentheLagrangemultiplierstatisticin(13-68)isrecom-putedwithfirm-by-firmseparateregressions,thestatisticfallsto29.04,whichisstillsignificant,butfarlessthanwhatwefoundearlier.WenowallowfordifferentAR(1)disturbanceprocessesforeachfirm.Thefirmspecificautocorrelationcoefficientsoftheordinaryleastsquaresresidualsarer=(0.478−0.2510.3010.5780.576).\nGreene-50240bookJune18,200215:28CHAPTER13✦ModelsforPanelData333[Aninterestingproblemarisesatthispoint.Ifonecomputestheseautocorrelationsusingthestandardformula,thentheresultscanbesubstantiallyaffectedbecausethegroup-specificresidualsmaynothavemeanzero.Sincethepopulationmeaniszeroifthemodeliscorrectlyspecified,thenthispointisonlyminor.Aswewillex-plorelater,however,thismodelisnotcorrectlyspecifiedforthesedata.Assuch,thenonzeroresidualmeanforthegroupspecificresidualvectorsmattersgreatly.Thevectorofautocorrelationscomputedwithoutusingdeviationsfrommeansisr0=(0.478,0.793,0.905,0.602,0.868).Threeofthefiveareverydifferent.Whichwaythecomputationsshouldbedonenowbecomesasubstantivequestion.Theasymptotictheoryweighsinfavorof(13-62).Asapracticalmatter,insmallormoderatelysizedsamplessuchasthisone,asthisexampledemonstrates,themeandeviationsareprefer-able.]Table13.4alsopresentsestimatesforthegroupwiseheteroscedasticitymodelandforthefullmodelwithcross-sectionalcorrelation,withthecorrectionsforfirst-orderautocorrelation.Thelowerpartofthetabledisplaystherecomputedgroupspecificvariancesandcross-groupcorrelations.13.9.8SUMMARYTheprecedingsectionshavesuggestedavarietyofdifferentspecificationsofthegener-alizedregressionmodel.Whichonesapplyinagivensituationdependsonthesetting.Homoscedasticitywilldependonthenatureofthedataandwilloftenbedirectlyob-servableattheoutset.Uncorrelatednessacrossthecross-sectionalunitsisastrongassumption,particularlybecausethemodelassignsthesameparametervectortoallunits.Autocorrelationisaqualitativelydifferentproperty.Althoughitdoesappeartoarisenaturallyintime-seriesdata,onewouldwanttolookcarefullyatthedataandthemodelspecificationbeforeassumingthatitispresent.ThepropertiesofalltheseestimatorsdependonanincreaseinT,sotheyaregenerallynotwellsuitedtothetypesofdatasetsdescribedinSections13.2–13.8.Becketal.(1993)suggestseveralproblemsthatmightarisewhenusingthismodelinsmallsamples.IfTKi.Thedataareassumedtobewellbehaved,asdescribedinSection5.2.1,andweshallnottreattheissueseparatelyhere.Forthepresent,wealsoassumethatdisturbancesareuncorrelatedacrossobservations.Therefore,E[εitεjs|X1,X2,...,XM]=σij,ift=sand0otherwise.ThedisturbanceformulationisthereforeE[εε|X,X,...,X]=σIij12MijTorσ11Iσ12I···σ1MIσ21Iσ22I···σ2MIE[εε|X,X,...,X]==.(14-3)12M...σM1IσM2I···σMMINotethatwhenthedatamatricesaregroupspecificobservationsonthesamevariables,asinExample14.1,thespecificationofthismodelispreciselythatofthecovariancestructuresmodelofSection13.9savefortheextensionherethatallowstheparametervectortovaryacrossgroups.Thecovariancestructuresmodelis,therefore,atestablespecialcase.4Itwillbeconvenientinthediscussionbelowtohaveatermfortheparticularkindofmodelinwhichthedatamatricesaregroupspecificdatasetsonthesamesetofvariables.TheGrunfeldmodelnotedinExample14.1issuchacase.Thisspecialcaseoftheseeminglyunrelatedregressionsmodelisamultivariateregressionmodel.Incontrast,thecostfunctionmodelexaminedinSection14.5isnotofthistype—itconsistsofacostfunctionthatinvolvesoutputandpricesandasetofcostshareequationsthathaveonlyasetofconstantterms.Weemphasize,thisismerelyaconvenienttermforaspecificformoftheSURmodel,notamodificationofthemodelitself.14.2.1GENERALIZEDLEASTSQUARESEachequationis,byitself,aclassicalregression.Therefore,theparameterscouldbeestimatedconsistently,ifnotefficiently,oneequationatatimebyordinaryleastsquares.3Thereareafewresultsforunequalnumbersofobservations,suchasSchmidt(1977),Baltagi,Garvin,andKerman(1989),Conniffe(1985),Hwang,(1990)andIm(1994).Butgenerally,thecaseoffixedTisthenorminpractice.4Thisisthetestof“AggregationBias”thatisthesubjectofZellner(1962,1963).(Thebiasresultsifparameterequalityisincorrectlyassumed.)\nGreene-50240bookJune19,200210:4342CHAPTER14✦SystemsofRegressionEquationsThegeneralizedregressionmodelappliestothestackedmodel,y1X10···0β1ε1y20X2···0β2ε2.=..+.=Xβ+ε.(14-4)........yM00···XMβMεMTherefore,theefficientestimatorisgeneralizedleastsquares.5Themodelhasapartic-ularlyconvenientform.Forthetthobservation,theM×Mcovariancematrixofthedisturbancesisσ11σ12···σ1Mσ21σ22···σ2M=.,(14-5)..σM1σM2···σMMso,in(14-3),=⊗Iand−1=−1⊗I.(14-6)Denotingtheijthelementof−1byσij,wefindthattheGLSestimatorisβˆ=[X−1X]−1X−1y=[X(−1⊗I)X]−1X(−1⊗I)y.ExpandingtheKroneckerproductsproduces11121M−1Mσ1jXyσX1X1σX1X2···σX1XMj=11jσ21XXσ22XX···σ2MXXM21222Mσ2jXyβˆ=.j=12j.(14-7).....σM1XXσM2XX···σMMXXM1M2MMMσMjXyj=1MjTheasymptoticcovariancematrixfortheGLSestimatoristheinversematrixin(14-7).AlltheresultsofChapter10forthegeneralizedregressionmodelextendtothismodel(whichhasbothheteroscedasticityand“autocorrelation”).Thisestimatorisobviouslydifferentfromordinaryleastsquares.Atthispoint,however,theequationsarelinkedonlybytheirdisturbances—hencethenameseem-inglyunrelatedregressionsmodel—soitisinterestingtoaskjusthowmuchefficiencyisgainedbyusinggeneralizedleastsquaresinsteadofordinaryleastsquares.Zellner(1962)andDwivediandSrivastava(1978)haveanalyzedsomespecialcasesindetail.5SeeZellner(1962)andTelser(1964).\nGreene-50240bookJune19,200210:4CHAPTER14✦SystemsofRegressionEquations3431.Iftheequationsareactuallyunrelated—thatis,ifσij=0fori=j—thenthereisobviouslynopayofftoGLSestimationofthefullsetofequations.Indeed,fullGLSisequationbyequationOLS.62.Iftheequationshaveidenticalexplanatoryvariables—thatis,ifXi=Xj—thenOLSandGLSareidentical.WewillturntothiscaseinSection14.2.2andthenexamineanimportantapplicationinSection14.2.5.73.Iftheregressorsinoneblockofequationsareasubsetofthoseinanother,thenGLSbringsnoefficiencygainoverOLSinestimationofthesmallersetofequations;thus,GLSandOLSareonceagainidentical.WewilllookatanapplicationofthisresultinSection19.6.5.8Inthemoregeneralcase,withunrestrictedcorrelationofthedisturbancesanddifferentregressorsintheequations,theresultsarecomplicatedanddependentonthedata.Twopropositionsthatapplygenerallyareasfollows:1.Thegreateristhecorrelationofthedisturbances,thegreateristheefficiencygainaccruingtoGLS.2.ThelesscorrelationthereisbetweentheXmatrices,thegreateristhegaininefficiencyinusingGLS.914.2.2SEEMINGLYUNRELATEDREGRESSIONSWITHIDENTICALREGRESSORSThecaseofidenticalregressorsisquitecommon,notablyinthecapitalassetpricingmodelinempiricalfinance—seeSection14.2.5.Inthisspecialcase,generalizedleastsquaresisequivalenttoequationbyequationordinaryleastsquares.Imposetheas-sumptionthatX=X=X,sothatXX=XXforalliandjin(14-7).Theinverseijijmatrixontheright-handsidenowbecomes[−1⊗XX]−1,which,using(A-76),equals[⊗(XX)−1].Alsoontheright-handside,eachtermXyequalsXy,which,inturnijjequalsXXb.Withtheseresults,aftermovingthecommonXXoutofthesummationsjontheright-handside,weobtainM1lσ11(XX)−1σ12(XX)−1···σ1M(XX)−1(XX)l=1σblσ21(XX)−1σ22(XX)−1···σ2M(XX)−1(XX)Mσ2lbβˆ=.l=1l.(14-8).....−1−1−1σM1(XX)σM2(XX)···σMM(XX)(XX)MσMlbl=1l6SeealsoBaltagi(1989)andBartelsandFeibig(1991)forothercasesinwhichOLS=GLS.7Anintriguingresult,albeitprobablyofnegligiblepracticalsignificance,isthattheresultalsoappliesiftheX’sareallnonsingular,andnotnecessarilyidentical,linearcombinationsofthesamesetofvariables.TheformalresultwhichisacorollaryofKruskal’sTheorem[seeDavidsonandMacKinnon(1993,p.294)]isthatOLSandGLSwillbethesameiftheKcolumnsofXarealinearcombinationofexactlyKcharacteristicvectorsof.ByshowingtheequalityofOLSandGLShere,wehaveverifiedtheconditionsofthecorollary.Thegeneralresultispursuedintheexercises.Theintriguingresultcitedisnowanobviouscase.8TheresultwasanalyzedbyGoldberger(1970)andlaterbyRevankar(1974)andConniffe(1982a,b).9SeealsoBinkley(1982)andBinkleyandNelson(1988).\nGreene-50240bookJune19,200210:4344CHAPTER14✦SystemsofRegressionEquationsNow,weisolateoneofthesubvectors,saythefirst,fromβˆ.Aftermultiplication,themomentmatricescancel,andweareleftwithMMMMMβˆ=σσj1b=bσσj1+bσσj2+···+bσσjM.11jl11j21jM1jj=1l=1j=1j=1j=1Thetermsinparenthesesaretheelementsofthefirstrowof−1=I,sotheendresultisβˆ1=b1.Fortheremainingsubvectors,whichareobtainedthesameway,βˆi=bi,whichistheresultwesought.10Toreiterate,theimportantresultwehavehereisthatintheSURmodel,whenallequationshavethesameregressors,theefficientestimatorissingle-equationordinaryleastsquares;OLSisthesameasGLS.Also,theasymptoticcovariancematrixofβˆforthiscaseisgivenbythelargeinversematrixinbracketsin(14-8),whichwouldbeestimatedby1Est.Asy.Cov[βˆ,βˆ]=σˆ(XX)−1,i,j=1,...,M,whereˆ=σˆ=ee.ijijijijijTExceptinsomespecialcases,thisgeneralresultislostifthereareanyrestrictionsonβ,eitherwithinoracrossequations.Wewillexamineoneofthosecases,theblockofzerosrestriction,inSections14.2.6and19.6.5.14.2.3FEASIBLEGENERALIZEDLEASTSQUARESTheprecedingdiscussionassumesthatisknown,which,asusual,isunlikelytobethecase.FGLSestimatorshavebeendevised,however.11Theleastsquaresresidualsmaybeused(ofcourse)toestimateconsistentlytheelementsofwitheeijσˆijsij=.(14-9)TTheconsistencyofsijfollowsfromthatofbiandbj.Adegreesoffreedomcorrectioninthedivisorisoccasionallysuggested.Twopossibilitiesareeeees∗=ijands∗∗=ij.12ij[(T−K)(T−K)]1/2ijT−max(K,K)ijijThesecondisunbiasedonlyifiequalsjorKiequalsKj,whereasthefirstisunbiasedonlyifiequalsj.WhetherunbiasednessoftheestimateofusedforFGLSisavirtueˆˆhereisuncertain.TheasymptoticpropertiesofthefeasibleGLSestimator,βdonotrelyonanunbiasedestimatorof;onlyconsistencyisrequired.AllourresultsfromChapters10–13forFGLSestimatorsextendtothismodel,withnomodification.We10SeeHashimotoandOhtani(1996)fordiscussionofhypothesistestinginthiscase.11SeeZellner(1962)andZellnerandHuang(1962).12See,aswell,Judgeetal.(1985),Theil(1971)andSrivistavaandGiles(1987).\nGreene-50240bookJune19,200210:4CHAPTER14✦SystemsofRegressionEquations345shalluse(14-9)inwhatfollows.Withs11s12···s1Ms21s22···s2MS=.(14-10)..sM1sM2···sMMinhand,FGLScanproceedasusual.IteratedFGLSwillbemaximumlikelihoodifitisbasedon(14-9).Goodness-of-fitmeasuresforthesystemhavebeendevised.Forinstance,McElroy(1977)suggestedthesystemwidemeasureεˆˆ−1εˆMR2=1−=1−,(14-11)∗MMijTtr(ˆ−1S)i=1j=1σˆt=1(yit−y¯i)(yjt−y¯j)yywhereˆindicatestheFGLSestimate.(TheadvantageofthesecondformulationisthatitinvolvesM×Mmatrices,whicharetypicallyquitesmall,whereasˆisMT×MT.Inourcase,Mequals5,butMTequals100.)Themeasureisboundedby0and1andisrelatedtotheFstatisticusedtotestthehypothesisthatalltheslopesinthemodelarezero.FitmeasuresinthisgeneralizedregressionmodelhavealltheshortcomingsdiscussedinSection10.5.1.Anadditionalproblemforthismodelisthatoverallfitmeasuressuchasthatin(14-11)willobscurethevariationinfitacrossequations.Fortheinvestmentexample,usingtheFGLSresidualsfortheleastrestrictivemodelinTable13.4(thecovariancestructuresmodelwithidenticalcoefficientvectors),McElroy’smeasuregivesavalueof0.846.ButascanbeseeninFigure14.1,thisapparentlygoodFIGURE14.1FGLSResidualswithEqualityRestrictions.4002000Residual200400GeneralChryslerGeneralWestinghouseU.S.SteelMotorsElectric\nGreene-50240bookJune19,200210:4346CHAPTER14✦SystemsofRegressionEquations400240800Residual80240400GeneralChryslerGeneralWestinghouseU.S.SteelMotorsElectricFIGURE14.2SURResiduals.overallfitisanaggregateofmediocrefitsforChryslerandWestinghouseandobviouslyterriblefitsforGM,GE,andU.S.Steel.Indeed,theconventionalmeasureforGEbasedonthesameFGLSresiduals,1−ee/yM0yis−16.7!GEGEGEGEWemightuse(14-11)tocomparethefitoftheunrestrictedmodelwithseparatecoefficientvectorsforeachfirmwiththerestrictedonewithacommoncoefficientvec-tor.Theresultin(14-11)withtheFGLSresidualsbasedontheseeminglyunrelatedregressionestimatesinTable14.1(inExample14.2)givesavalueof0.871,whichcom-paredto0.846appearstobeanunimpressiveimprovementinthefitofthemodel.ButacomparisonoftheresidualplotinFigure14.2withthatinFigure14.1showsthat,onthecontrary,thefitofthemodelhasimproveddramatically.Theupshotisthatalthoughafitmeasureforthesystemmighthavesomevirtueasadescriptivemeasure,itshouldbeusedwithcare.Fortestingahypothesisaboutβ,astatisticanalogoustotheFratioinmultipleregressionanalysisis(Rβˆ−q)[R(Xˆ−1X)−1R]−1(Rβˆ−q)/JF[J,MT−K]=.(14-12)εˆˆ−1εˆ/(MT−K)Thecomputationrequirestheunknown.IfweinserttheFGLSestimateˆbasedon(14-9)andusetheresultthatthedenominatorconvergestoone,then,inlargesamples,thestatisticwillbehavethesameasFˆ=1ˆˆˆˆ−1ˆˆ(Rβ−q)[RVar[β]R](Rβ−q).(14-13)JThiscanbereferredtothestandardFtable.Becauseitusestheestimated,evenwithnormallydistributeddisturbances,theFdistributionisonlyvalidapproximately.Ingeneral,thestatisticF[J,n]convergesto1/Jtimesachi-squared[J]asn→∞.\nGreene-50240bookJune19,200210:4CHAPTER14✦SystemsofRegressionEquations347Therefore,analternativeteststatisticthathasalimitingchi-squareddistributionwithJdegreesoffreedomwhenthehypothesisistrueisˆˆˆˆ−1ˆˆJFˆ=(Rβ−q)[RVar[β]R](Rβ−q).(14-14)ˆˆThiscanberecognizedasaWaldstatisticthatmeasuresthedistancebetweenRβandq.Bothstatisticsarevalidasymptotically,but(14-13)mayperformbetterinasmallormoderatelysizedsample.13Onceagain,thedivisorusedincomputingσˆmaymakeaijdifference,butthereisnogeneralrule.Ahypothesisofparticularinterestisthehomogeneityrestrictionofequalcoefficientvectorsinthemultivariateregressionmodel.Thatcaseisfairlycommoninthissetting.Thehomogeneityrestrictionisthatβi=βM,i=1,...,M−1.Consistentwith(14-13)–(14-14),wewouldformthehypothesisasI0···0−Iβ1β1−βM0I···0−Iβ2β2−βMRβ===0.(14-15)·········00···I−IβMβM−1−βMThisspecifiesatotalof(M−1)KrestrictionsontheKM×1parametervector.Denoteˆˆˆˆtheestimatedasymptoticcovariancefor(βi,βj)asVˆij.Thebracketedmatrixin(14-13)wouldhavetypicalblockˆˆ[RVar[β]R]ij=Vˆii−Vˆij−Vˆji+VˆjjThismaybeaconsiderableamountofcomputation.Thetestwillbesimplerifthemodelhasbeenfitbymaximumlikelihood,asweexamineinthenextsection.14.2.4MAXIMUMLIKELIHOODESTIMATIONTheOberhofer–Kmenta(1974)conditions(seeSection11.7.2)aremetfortheseeminglyunrelatedregressionsmodel,somaximumlikelihoodestimatescanbeobtainedbyiteratingtheFGLSprocedure.Wenote,onceagain,thatthisprocedurepresumestheuseof(14-9)forestimationofσijateachiteration.MaximumlikelihoodenjoysnoadvantagesoverFGLSinitsasymptoticproperties.14Whetheritwouldbepreferableinasmallsampleisanopenquestionwhoseanswerwilldependontheparticulardataset.Bysimplyinsertingthespecialformofinthelog-likelihoodfunctionforthegeneralizedregressionmodelin(10-32),wecanconsiderdirectmaximizationinsteadofiteratedFGLS.Itisuseful,however,toreexaminethemodelinasomewhatdifferentformulation.Thisalternativeconstructionofthelikelihoodfunctionappearsinmanyotherrelatedmodelsinanumberofliteratures.13SeeJudgeetal.(1985,p.476).TheWaldstatisticoftenperformspoorlyinthesmallsamplesizestypicalinthisarea.Feibig(2001,pp.108–110)surveysarecentliteratureonmethodsofimprovingthepoweroftestingproceduresinSURmodels.14Jensen(1995)considerssomevariationonthecomputationoftheasymptoticcovariancematrixfortheestimatorthatallowsforthepossibilitythatthenormalityassumptionmightbeviolated.\nGreene-50240bookJune19,200210:4348CHAPTER14✦SystemsofRegressionEquationsConsideroneobservationoneachoftheMdependentvariablesandtheirassociatedregressors.Wewishtoarrangethisobservationhorizontallyinsteadofvertically.Themodelforthisobservationcanbewritten[yy···y]=[x∗][ππ···π]+[εε···ε]12Mtt12M12Mt(14-16)=[x∗]+E,twherex∗isthefullsetofallK∗differentindependentvariablesthatappearinthemodel.tTheparametermatrixthenhasonecolumnforeachequation,butthecolumnsarenotthesameasβiin(14-4)unlesseveryvariablehappenstoappearineveryequation.Otherwise,intheithequation,πiwillhaveanumberofzerosinit,eachoneimposinganexclusionrestriction.Forexample,considertheGMandGEequationsfromtheBoot–deWittdatainExample14.1.Thetthobservationwouldbeαgαeβ1g0[IgIe]t=[1FgCgFeCe]tβ2g0+[εgεe]t.0β1e0β2eThisvectorisoneobservation.LetεtbethevectorofMdisturbancesforthisobservationarranged,fornow,inacolumn.ThenE[εε]=.ThelogofthejointttnormaldensityoftheseMdisturbancesisM11logL=−log(2π)−log||−ε−1ε.(14-17)ttt222Thelog-likelihoodforasampleofTjointobservationsisthesumoftheseovert:TMTT1TlogL=logL=−log(2π)−log||−ε−1ε.(14-18)ttt222t=1t=1Theterminthesummationin(14-18)isascalarthatequalsitstrace.Wecanalwayspermutethematricesinatrace,soTTTε−1ε=tr(ε−1ε)=tr(−1εε).ttttttt=1t=1t=1Thiscanbefurthersimplified.ThesumofthetracesofTmatricesequalsthetraceofthesumofthematrices[see(A-91)].Wewillnowalsobeabletomovetheconstantmatrix,−1,outsidethesummation.Finally,itwillproveusefultomultiplyanddividebyT.Combiningallthreesteps,weobtainTT1tr(−1εε)=Ttr−1εε=Ttr(−1W)(14-19)ttttTt=1t=1where1TWij=εtiεtj.Tt=1\nGreene-50240bookJune19,200210:4CHAPTER14✦SystemsofRegressionEquations349Sincethisstepusesactualdisturbances,E[Wij]=σij;WistheM×Mmatrixwewouldusetoestimateiftheεswereactuallyobserved.Insertingthisresultinthelog-likelihood,wehaveTlogL=−[Mlog(2π)+log||+tr(−1W)].(14-20)2Wenowconsidermaximizingthisfunction.Ithasbeenshown15that∂logLT=X∗E−1∂2(14-21)∂logLT=−−1(−W)−1.∂2wherethex∗in(14-16)isrowtofX∗.Equatingthesecondofthesederivativestoazerotmatrix,weseethatgiventhemaximumlikelihoodestimatesoftheslopeparameters,themaximumlikelihoodestimatorofisW,thematrixofmeanresidualsumsofsquaresandcrossproducts—thatis,thematrixwehaveusedforFGLS.[Noticethatthereisnocorrectionfordegreesoffreedom;∂logL/∂=0implies(14-9).]Wealsoknowthatbecausethismodelisageneralizedregressionmodel,themaxi-mumlikelihoodestimatoroftheparametermatrix[β]mustbeequivalenttotheFGLSestimatorwediscussedearlier.16Itisusefultogoastepfurther.Ifweinsertoursolutionforinthelikelihoodfunction,thenweobtaintheconcentratedlog-likelihood,TlogLc=−[M(1+log(2π))+log|W|].(14-22)2Wehaveshown,therefore,thatthecriterionforchoosingthemaximumlikelihoodestimatorofβisβˆ=Min1log|W|,(14-23)MLβ2subjecttotheexclusionrestrictions.Thisimportantresultreappearsinmanyothermod-elsandsettings.Thisminimizationmustbedonesubjecttotheconstraintsinthepa-rametermatrix.Inourtwo-equationexample,therearetwoblocksofzerosintheparametermatrix,whichmustbepresentintheMLEaswell.Theestimatorofβisthesetofnonzeroelementsintheparametermatrixin(14-16).ThelikelihoodratiostatisticisanalternativetotheFstatisticdiscussedearlierfortestinghypothesesaboutβ.Thelikelihoodratiostatisticisλ=−2(logL−logL)=T(log|Wˆ|−log|Wˆ|),17(14-24)ruruwhereWˆrandWˆuaretheresidualsumsofsquaresandcross-productmatricesusingtheconstrainedandunconstrainedestimators,respectively.Thelikelihoodratiostatisticisasymptoticallydistributedaschi-squaredwithdegreesoffreedomequaltothenumberofrestrictions.Thisprocedurecanalsobeusedtotestthehomogeneityrestrictioninthemultivariateregressionmodel.TherestrictedmodelisthecovariancestructuresmodeldiscussedinSection13.9intheprecedingchapter.15See,forexample,Joreskog(1973).16ThisequivalenceestablishestheOberhofer–Kmentaconditions.17SeeAttfield(1998)forrefinementsofthiscalculationtoimprovethesmallsampleperformance.\nGreene-50240bookJune19,200210:4350CHAPTER14✦SystemsofRegressionEquationsItmayalsobeofinteresttotestwhetherisadiagonalmatrix.TwopossibleapproachesweresuggestedinSection13.9.6[see(13-67)and(13-68)].Theunrestrictedmodelistheoneweareusinghere,whereastherestrictedmodelisthegroupwiseheteroscedasticmodelofSection11.7.2(Example11.5),withouttherestrictionofequal-parametervectors.Assuch,therestrictedmodelreducestoseparateregressionmodels,estimablebyordinaryleastsquares.ThelikelihoodratiostatisticwouldbeMλ=Tlogσˆ2−log|ˆ|,(14-25)LRii=1whereσˆ2isee/Tfromtheindividualleastsquaresregressionsandˆisthemaximumiiilikelihoodestimatorof.Thisstatistichasalimitingchi-squareddistributionwithM(M−1)/2degreesoffreedomunderthehypothesis.ThealternativesuggestedbyBreuschandPagan(1980)istheLagrangemultiplierstatistic,Mi−1λ=Tr2,(14-26)LMiji=2j=1whereristheestimatedcorrelationσˆ/[σˆσˆ]1/2.Thisstatisticalsohasalimitingchi-ijijiijjsquareddistributionwithM(M−1)/2degreesoffreedom.Thistesthastheadvantagethatitdoesnotrequirecomputationofthemaximumlikelihoodestimatorof,sinceitisbasedontheOLSresiduals.Example14.2EstimatesofaSeeminglyUnrelatedRegressionsModelByrelaxingtheconstraintthatallfivefirmshavethesameparametervector,weobtainafive-equationseeminglyunrelatedregressionmodel.TheFGLSestimatesforthesystemaregiveninTable14.1,wherewehaveincludedtheequalityconstrained(pooled)estimatorfromtheco-variancestructuresmodelinTable13.4forcomparison.Thevariablesaretheconstantterms,FandC,respectively.ThecorrelationsoftheFGLSandequalityconstrainedFGLSresidualsaregivenbelowthecoefficientestimatesinTable14.1.Theassumptionofequal-parametervectorsappearstohaveseriouslydistortedthecorrelationscomputedearlier.WewouldhaveexpectedthisbasedonthecomparisonofFigures14.1and14.2.Thediagonalelementsinˆarealsodrasticallyinflatedbytheimpositionofthehomogeneityconstraint.TheequationbyequationOLSestimatesaregiveninTable14.2.Asexpected,theestimatedstandarderrorsfortheFGLSestimatesaregenerallysmaller.TheFstatisticfortestingthehypothesisofequal-parametervectorsinallfiveequationsis129.169with12and(100–15)degreesoffreedom.Thisvalueisfarlargerthanthetabledcriticalvalueof1.868,sothehypothesisofparameterhomogeneityshouldberejected.Wemighthaveexpectedthisresultinviewofthedramaticreductioninthediagonalelementsofˆcomparedwiththoseofthepooledesti-mator.ThemaximumlikelihoodestimatesoftheparametersaregiveninTable14.3.Thelogdeterminantoftheunrestrictedmaximumlikelihoodestimatorofis31.71986,sothelog-likelihoodis20(5)20logLu=−[log(2π)+1]−31.71986=−459.0925.22Therestrictedmodelwithequal-parametervectorsandcorrelationacrossequationsisdis-cussedinSection13.9.6,andtherestrictedMLEsaregiveninTable13.4.(Theestimateofisnotshownthere.)Thelogdeterminantfortheconstrainedmodelis39.1385.Thelog-likelihoodfortheconstrainedmodelistherefore−515.422.Thelikelihoodratioteststatisticis112.66.The1percentcriticalvaluefromthechi-squareddistributionwith12degreesoffreedomis26.217,sothehypothesisthattheparametersinallfiveequationsareequalis(onceagain)rejected.\nGreene-50240bookJune19,200210:4CHAPTER14✦SystemsofRegressionEquations351TABLE14.1FGLSParameterEstimates(StandardErrorsinParentheses)GMCHGEWEUSPooledβ1−162.360.5043−22.4391.088985.423−28.247(89.46)(11.51)(25.52)(6.2959)(111.9)(4.888)β20.120490.069550.037290.057010.10150.08910(0.0216)(0.0169)(0.0123)(0.0114)(0.0547)(0.00507)β20.382750.30860.130780.04150.39990.3340(0.0328)(0.0259)(0.0221)(0.0412)(0.1278)(0.0167)FGLSResidualCovarianceandCorrelationMatrices[Pooledestimates]GM7216.04−0.2990.2690.257−0.330[10050.52][−0.349][−0.248][−.0.356][−0.716]CH−313.70152.850.006,0.2380.384,[−4.8051][305.61][0.158][0.246][0.244]GE605.342.0474700.460.7770.482[−7160.67][−1966.65][34556.6][0.895][−0.176]WE129.8916.661200.3294.9120.699[−1400.75][−123.921][4274.0][833.6][−0.040]US−2686.5455.091224.4652.729188.2[4439.99][2158.595][−28722.0][−2893.7][34468.9]TABLE14.2OLSParameterEstimates(StandardErrorsinParentheses)GMCHGEWEUSPooledβ1−149.78−6.1899−9.956−0.5094−30.369−48.030(105.84)(13.506)(31.374)(8.0152)(157.05)(21.480)β20.119280.077950.026550.052890.15660.10509(0.0258)(0.0198)(0.0157)(0.0157)(0.0789)(0.01378)β20.371440.31570.151690.09240.42390.30537(0.0371)(0.0288)(0.0257)(0.0561)(0.1552)(0.04351)σ27160.29149.872660.32988.6628896.4215857.24BasedontheOLSresults,theLagrangemultiplierstatisticis29.046,with10degreesoffreedom.The1percentcriticalvalueis23.209,sothehypothesisthatisdiagonalcanalsoberejected.Tocomputethelikelihoodratiostatisticforthistest,wewouldcomputethelogdeterminantbasedontheleastsquaresresults.ThiswouldbethesumofthelogsoftheresidualvariancesgiveninTable14.2,whichis33.957106.Thestatisticforthelikelihoodratiotestusing(14–25)istherefore20(33.95706−31.71986)=44.714.Thisisalsolargerthanthecriticalvaluefromthetable.Basedonalltheseresults,weconcludethatneithertheparameterhomogeneityrestrictionnortheassumptionofuncorrelateddisturbancesappearstobeconsistentwithourdata.14.2.5ANAPPLICATIONFROMFINANCIALECONOMETRICS:THECAPITALASSETPRICINGMODELOneofthegrowthareasineconometricsisitsapplicationtotheanalysisoffinancialmarkets.18Thecapitalassetpricingmodel(CAPM)isoneofthefoundationsofthatfieldandisafrequentsubjectofeconometricanalysis.18ThepioneeringworkofCampbell,Lo,andMacKinlay(1997)isabroadsurveyofthefield.ThedevelopmentinthisexampleisbasedontheirChapter5.\nGreene-50240bookJune19,200210:4352CHAPTER14✦SystemsofRegressionEquationsTABLE14.3MaximumLikelihoodEstimatesGMCHGEWEUSPooledβ1−173.2182.39111−16.6624.37312136.969−2.217(84.30)(11.63)(24.96)(6.018)(94.8)(1.960)β20.1220400.067410.03710.053970.088650.02361(0.02025)(0.01709)(0.0118)(0.0103)(0.0454)(0.00429)β20.389140.305200.117230.0269300.312460.17095(0.03185)(0.02606)(0.0217)(0.03708)(0.118)(0.0152)ResidualCovarianceMatrixGM7307.30CH−330.55155.08GE550.2711.429741.22WE118.8318.376220.33103.13US−2879.10463.211408.11734.839671.4Markowitz(1959)developedatheoryofanindividualinvestor’soptimalportfolioselectionintermsofthetrade-offbetweenexpectedreturn(mean)andrisk(variance).Sharpe(1964)andLintner(1965)showedhowthetheorycouldbeextendedtotheaggregate“market”portfolio.TheSharpeandLintneranalysesproducethefollowingmodelfortheexpectedexcessreturnfromanasseti:E[Ri]−Rf=βiE[Rm]−Rf,whereRiisthereturnonasseti,Rfisthereturnona“risk-free”asset,Rmisthereturnonthemarket’soptimalportfolio,andβiistheasset’smarket“beta,”Cov[Ri,Rm]βi=.Var[Rm]Thetheorystatesthattheexpectedexcessreturnonassetiwillequalβitimestheexpectedexcessreturnonthemarket’sportfolio.Black(1972)consideredthemoregeneralcaseinwhichthereisnorisk-freeasset.Inthisinstance,theobservedRfisreplacedbytheunobservablereturnona“zero-beta”portfolio,E[R0]=γ.TheempiricalcounterparttotheSharpeandLintnermodelforassets,i=1,...,N,observedoverTperiods,t=1,...,T,isaseeminglyunrelatedregressions(SUR)model,whichwecastintheformof(14-16):α1α2···αN[y1,y2,...,yN]=[1,zt]+[ε1,ε2,...,εN]t=xt+εt,β1β2···βNwhereyitisRit−Rft,theobservedexcessreturnonassetiinperiodt;ztisRmt−Rft,themarketexcessreturninperiodt;anddisturbancesεitarethedeviationsfromtheconditionalmeans.WedefinetheT×2matrixX=[1,zt],t=1,...,T.Theassump-tionsoftheseeminglyunrelatedregressionsmodelare1.E[εt|X]=E[εt]=0,2.Var[ε|X]=E[εε|X]=,apositivedefiniteN×Nmatrix,ttt3.εt|X∼N[0,].\nGreene-50240bookJune19,200210:4CHAPTER14✦SystemsofRegressionEquations353Thedataarealsoassumedtobe“wellbehaved”sothat4.plimz¯=E[zt]=µz.5.plims2=plim(1/T)T(z−z¯)2=Var[z]=σ2.zt=1ttzSincethismodelisaparticularcaseoftheonein(14-16),wecanproceedto(14-20)through(14-23)forthemaximumlikelihoodestimatorsofand.Indeed,sincethismodelisanunrestrictedSURmodelwiththesameregressor(s)ineveryequation,weknowfromourresultsinSection14.2.2thattheGLSandmaximumlikelihoodestimatorsaresimplyequationbyequationordinaryleastsquaresandthattheestimatorofisjustS,thesamplecovariancematrixoftheleastsquaresresiduals.Theasymptoticcovariancematrixforthe2N×1estimator[a,b]willbe1XX−11σ2+µ2µzzzAsy.Var[a,b]=plim⊗=⊗,TTTσ2µz1zwhichwewillestimatewith(XX)−1⊗S.[Plimzz/T=plim[(1/T)(z−z¯)2+z¯2]=tt(σ2+µ2).]zzThemodelabovedoesnotimposetheMarkowitz–Sharpe–Lintnerhypothesis,H0:α=0.AWaldtestofH0canbebasedontheunrestrictedleastsquaresestimates:Ts2−111−1z−1W=(a−0)Est.Asy.Var[a−0](a−0)=a[(XX)S]a=aSa.s2+z¯2z[Tocarryoutthistest,wenowrequirethatTbegreaterthanorequaltoN,sothatS=(1/T)eewillhavefullrank.Theassumptionwasnotnecessaryuntilthispoint.]tttUnderthenullhypothesis,thestatistichasalimitingchi-squareddistributionwithNdegreesoffreedom.Thesmall-samplemisbehavioroftheWaldstatistichasbeenwidelyobserved.Analternativethatislikelytobebetterbehavedis[(T−N−1)/N]W,whichisexactlydistributedasF[N,T−N−1]underthenullhypothesis.TocarryoutalikelihoodratioorLagrangemultipliertestofthenullhypothesis,wewillrequiretherestrictedestimates.Bysettingα=0inthemodel,weobtain,onceagain,aSURmodelwithidenticalregressor,sotherestrictedmaximumlikelihoodestimatorsarea0i=0andb=yz/zz.Therestrictedestimatorofis,asbefore,thematrixofmeansquares0iiandcrossproductsoftheresiduals,nowS0.Thechi-squaredstatisticforthelikelihoodratiotestisgivenin(14-24);forthisapplication,itwouldbeλ=N(ln|S0|−ln|S|).TocomputetheLMstatistic,wewillrequirethederivativesoftheunrestrictedlog-likelihoodfunction,evaluatedattherestrictedestimators,whicharegivenin(14-21).Forthismodel,theymaybewritten∂lnLnTN=σijε=σij(Tε¯),jtj∂αij=1t=1j=1whereσijistheijthelementof−1,and∂lnLnTN=σijzε=σij(zε).tjtj∂βij=1t=1j=1\nGreene-50240bookJune19,200210:4354CHAPTER14✦SystemsofRegressionEquationsThefirstderivativeswithrespecttoβwillbezeroattherestrictedestimates,sincethetermsinparenthesesarethenormalequationsforrestrictedleastsquares;remember,theresidualsarenowe0it=yit−b0izt.Thefirstvectoroffirstderivativescanbewrittenas∂lnL=−1Ei=−1(Tε¯),∂αwhereiisaT×1vectorof1s,EisaT×Nmatrixofdisturbances,andε¯istheN×1vectorofmeansofassetspecificdisturbances.(Thesecondsubvectoris∂lnL/∂β=−1Ez.)Since∂lnL/∂β=0attherestrictedestimates,theLMstatisticinvolvesonlytheupperleftsubmatrixof−H−1.Combiningtermsandinsertingtherestrictedesti-mates,weobtain−1−1−1−1LM=Te¯0S0:0XX⊗S0Te¯0S0:0=T2(XX)11e¯S−1e¯000s2+z¯2=Tze¯S−1e¯.s2000zUnderthenullhypothesis,thelimitingdistributionofLMischi-squaredwithNdegreesoffreedom.ThemodelformulationgivesE[Rit]=Rft+βiE[Rmt]−Rft.Ifthereisnorisk-freeassetbutwewritethemodelintermsofγ,theunknownreturnonazero-betaportfolio,thenweobtainRit=γ+βi(Rmt−γ)+εit=(1−βi)γ+βiRmt+εit.Thisisessentiallythesameastheoriginalmodel,withtwomodifications.First,theobservablesinthemodelarerealreturns,notexcessreturns,whichdefinesthewaythedataenterthemodel.Second,therearenonlinearrestrictionsontheparameters;αi=(1−βi)γ.Althoughtheunrestrictedmodelhas2Nfreeparameters,Black’sfor-mulationimpliesN−1restrictionsandleavesN+1freeparameters.Thenonlinearrestrictionswillcomplicatefindingthemaximumlikelihoodestimators.Wedoknowfrom(14-21)thatregardlessofwhattheestimatorsofβiandγare,theestimatorofisstillS=(1/T)EE.So,wecanconcentratethelog-likelihoodfunction.TheOberhoferandKmenta(1974)resultsimplythatwemaysimplyzigzagbackandforthbetweenSand(βˆ,γ)ˆ(SeeSection11.7.2.)Second,althoughmaximizationover(β,γ)remainscomplicated,maximizationoverβforknownγistrivial.Foragivenvalueofγ,themaximumlikelihoodestimatorofβiistheslopeinthelinearregressionwithoutacon-stanttermof(Rit−γ)on(Rmt−γ).Thus,thefullsetofmaximumlikelihoodestimatorsmaybefoundjustbyscanningovertheadmissiblerangeofγtolocatethevaluethatmaximizes1lnLc=−ln|S(γ)|,2whereTt=1Rit−γ[1−βˆi(γ)]−βˆi(γ)RmtRjt−γ[1−βˆj(γ)]−βˆj(γ)Rmtsij(γ)=,T\nGreene-50240bookJune19,200210:4CHAPTER14✦SystemsofRegressionEquations355andTβˆt=1(Rit−γ)(Rmt−γ)i(γ)=T.(Rmt−γ)2t=1Forinferencepurposes,anestimatoroftheasymptoticcovariancematrixoftheestimatorsisrequired.Thelog-likelihoodforthismodelisT1TlnL=−[Nln2π+ln||]−ε−1εtt22t=1wheretheN×1vectorεtisεit=[Rit−γ(1−βi)−βiRmt],i=1,...,N.Thederivativesofthelog-likelihoodcanbewritten∂lnLT(R−γ)−1εTmtt∂[βγ]=−1=gt.(i−β)εtt=1t=1(WehaveomittedfromthegradientbecausetheexpectedHessianisblockdiagonal,and,atpresent,istangential.)Withthederivativesinthisform,wehave(R−γ)2−1(R−γ)−1(i−β)mtmtE[gtgt]=−1−1.(14-27)(Rmt−γ)(i−β)(i−β)(i−β)Now,sumthisexpressionovertandusetheresultthatTT(R−γ)2=(R−R¯)2+T(R¯−γ)2=Ts2+(R¯−γ)2mtmtmmRmmt=1t=1toobtainthenegativeoftheexpectedHessian,2s2+(R¯−γ)2−1(R¯−γ)−1(i−β)∂lnLRmmm−E=T.(14-28)ββ(R¯−γ)(i−β)−1(i−β)−1(i−β)∂∂mγγTheinverseofthismatrixprovidestheestimatorfortheasymptoticcovariancematrix.Using(A-74),aftersomemanipulationwefindthat1(µ−γ)2Asy.Var[γˆ]=1+Rm[(i−β)−1(i−β)]−1.Tσ2Rmwhereµ=plimR¯andσ2=plims2.RmmRmRmAlikelihoodratiotestoftheBlackmodelrequirestherestrictedestimatesoftheparameters.TheunrestrictedmodelistheSURmodelfortherealreturns,Ritonthemarketreturns,Rmt,withNfreeconstants,αi,andNfreeslopes,βi.Result(14-24)providestheteststatistic.Oncetheestimatesofβiandγareobtained,theimpliedestimatesofαiaregivenbyαi=(1−βi)γ.Withtheseestimatesinhand,theLMstatisticisexactlywhatitwasbefore,althoughnowall2NderivativeswillberequiredandXis[i,R].Thesubscript∗indicatescomputationattherestrictedestimates;ms2+R¯212R¯LM=TRmme¯S−1e¯+RES−1ER−mRES−1e¯.s2∗∗∗Ts2m∗∗∗ms2m∗∗∗RmRmz\nGreene-50240bookJune19,200210:4356CHAPTER14✦SystemsofRegressionEquationsAWaldtestoftheBlackmodelwouldbebasedontheunrestrictedestimators.Thehypothesisappearstoinvolvetheunknownγ,butinfact,thetheoryimpliesonlytheN−1nonlinearrestrictions:[(αi/αN)−(1−βi)/(1−βN)]=0or[αi(1−βN)−αN(1−βi)]=0.WritethissetofN−1functionsasc(α,β)=0.TheWaldstatisticbasedontheleastsquaresestimateswouldthenbe−1W=c(a,b)Est.Asy.Var[c(a,b)]c(a,b).RecallintheunrestrictedmodelthatAsy.Var[a,b]=(1/T)plim(XX/T)−1⊗=,say.Usingthedeltamethod(seeSectionD.2.7),theasymptoticcovariancematrixforc(a,b)wouldbe∂c(α,β)Asy.Var[c(a,b)]=where=.∂(α,β)Theithrowofthe2N×2Nmatrixhasfouronlynonzeroelements,oneeachintheithandNthpositionsofeachofthetwosubvectors.Beforeclosingthislengthyexample,wereconsidertheassumptionsofthemodel.Thereisampleevidence[e.g.,Affleck–GravesandMcDonald(1989)]thatthenormalityassumptionusedintheprecedingisnotappropriateforfinancialreturns.Thisfactinitselfdoesnotcomplicatetheanalysisverymuch.Althoughtheestimatorsderivedearlierarebasedonthenormallikelihood,theyarereallyonlygeneralizedleastsquares.Aswehaveseenbefore(inChapter10),GLSisrobusttodistributionalassumptions.TheLMandLRtestswedevisedarenot,however.Withoutthenormalityassumption,onlytheWaldstatisticsretaintheirasymptoticvalidity.Asnoted,thesmall-samplebehavioroftheWaldstatisticcanbeproblematic.Theapproachwehaveusedelsewhereistouseanapproximation,F=W/J,whereJisthenumberofrestrictions,andreferthestatistictothemoreconservativecriticalvaluesoftheF[J,q]distribution,whereqisthenumberofdegreesoffreedominestimation.Thus,onceagain,theroleofthenormalityassumptionisquiteminor.Thehomoscedasticityandnonautocorrelationassumptionsarepotentiallymoreproblematic.Thelatteralmostcertainlyinvalidatestheentiremodel.[SeeCampbell,Lo,andMacKinlay(1997)fordiscussion.]Ifthedisturbancesareonlyheteroscedastic,thenwecanappealtothewell-establishedconsistencyofordinaryleastsquaresinthegeneralizedregressionmodel.AGMMapproachmightseemtobecalledfor,butGMMestimationinthiscontextisirrelevant.Inallcases,theparametersareexactlyidentified.Whatisneededisarobustcovarianceestimatorforournowpseudomaximumlikelihoodestimators.FortheSharpe–Lintnerformulation,nothingmorethantheWhiteestimatorthatwedevelopedinChapters10and11isrequired;afterall,despitethecomplicationsofthemodels,theestimatorsbothwithandwithouttherestrictionsareordinaryleastsquares,equationbyequation.Foreachequationseparately,therobustasymptoticcovariancematrixin(10-14)applies.Fortheleastsquaresestimatorsqi=(ai,bi),weseekarobustestimatorofAsy.Cov[q,q]=Tplim(XX)−1XεεX(XX)−1.ijijAssumingthatE[εitεjt]=σij,thismatrixcanbeestimatedwithTEst.Asy.Cov[q,q]=[(XX)−1]xxee[(XX)−1].ijttitjtt=1\nGreene-50240bookJune19,200210:4CHAPTER14✦SystemsofRegressionEquations357ToformacounterpartfortheBlackmodel,wewillonceagainrelyontheassumptionthattheasymptoticcovarianceoftheMLEofandtheMLEof(β,γ)iszero.Thenthe“sandwich”estimatorforthisMestimator(seeSection17.8)isEst.Asy.Var(β,γ)ˆ=A−1BA−1,whereAappearsin(14-28)andBisin(14-27).14.2.6MAXIMUMLIKELIHOODESTIMATIONOFTHESEEMINGLYUNRELATEDREGRESSIONSMODELWITHABLOCKOFZEROSINTHECOEFFICIENTMATRIXInSection14.2.2,weconsideredthespecialcaseoftheSURmodelwithidenticalre-gressorsinallequations.Weshowedtherethatinthiscase,OLSandGLSareidentical.IntheSURmodelwithnormallydistributeddisturbances,GLSisthemaximumlikeli-hoodestimator.Itfollowsthatwhentheregressorsareidentical,OLSisthemaximumlikelihoodestimator.Inthissection,weconsiderarelatedcaseinwhichthecoefficientmatrixcontainsablockofzeros.TheblockofzerosiscreatedbyexcludingthesamesubsetoftheregressorsfromsomeofbutnotalltheequationsinamodelthatwithouttheexclusionrestrictionisaSURwiththesameregressorsinallequations.ThiscasecanbeexaminedinthecontextofthederivationoftheGLSestimatorin(14-7),butitismuchsimplertoobtaintheresultweseekforthemaximumlikelihoodestimator.Themodelwehavedescribedcanbeformulatedasin(14-16)asfollows.Wefirsttransposetheequationsystemin(14-16)sothatobservationtony1,...,yMiswrittenyt=xt+εt.IfwecollectallTobservationsinthisformat,thenthesystemwouldappearasY=X+E.M×TM×KK×TM×T(Eachrowofcontainstheparametersinaparticularequation.)Now,consideronceagainaparticularobservationandpartitionthesetofdependentvariablesintotwogroupsofM1andM2variablesandthesetofregressorsintotwosetsofK1andK2variables.Theequationsystemisnow!!y11112x1ε1ε1!!0ε1!!1112y=x+ε,Eε!X=,Varε!X=.2t21222t2t2t02t2122SincethissystemisstillaSURmodelwithidenticalregressors,themaximumlikelihoodestimatorsoftheparametersareobtainedusingequationbyequationleastsquaresregressions.Thecaseweareinterestedinhereistherestrictedmodel,with12=0,whichhastheeffectofexcludingx2fromalltheequationsfory1.Theresultswewillobtainforthiscaseare:1.Themaximumlikelihoodestimatorof11when12=0isequation-by-equationleastsquaresregressionofthevariablesiny1onx1alone.Thatis,evenwiththerestriction,theefficientestimatoroftheparametersofthefirstsetofequationsis\nGreene-50240bookJune19,200210:4358CHAPTER14✦SystemsofRegressionEquationsequation-by-equationordinaryleastsquares.Leastsquaresisnottheefficientestimatorforthesecondset,however.2.Theeffectoftherestrictiononthelikelihoodfunctioncanbeisolatedtoitseffectonthesmallersetofequations.Thus,thehypothesiscanbetestedwithoutestimatingthelargersetofequations.Webeginbyconsideringmaximumlikelihoodestimationoftheunrestrictedsystem.Thelog-likelihoodfunctionforthismultivariateregressionmodelisTlnL=lnf(y1t,y2t|x1t,x2t)t=1wheref(y1t,y2t|x1t,x2t)isthejointnormaldensityofthetwovectors.Thisresultis(14-17)through(14-19)inadifferentform.Wewillnowwritethisjointnormaldensityastheproductofamarginalandaconditional:f(y1t,y2t|x1t,x2t)=f(y1t|x1t,x2t)f(y2t|y1t,x1t,x2t).Themeanandvarianceofthemarginaldistributionfory1tarejusttheupperportionsoftheprecedingpartitionedmatrices:E[y1t|x1t,x2t]=11x1t+12x2t,Var[y1t|x1t,x2t]=11.TheresultsweneedfortheconditionaldistributionaregiveninTheoremB.6.Collectingterms,wehave−1−1−1E[y2t|y1t,x1t,x2t]=21−211111x1t+22−211112x2t+2111y1t=21x1t+22x2t+y1t,−1Var[y2t|y1t,x1t,x2t]=22−211112=22.Finally,sincethemarginaldistributionsandthejointdistributionareallmultivariatenormal,theconditionaldistributionisalso.Theobjectiveofthispartitioningistopar-titionthelog-likelihoodfunctionlikewise;TlnL=lnf(y1t,y2t|x1t,x2t)t=1T=lnf(y1t|x1t,x2t)f(y2t|y1t,x1t,x2t)t=1TT=lnf(y1t|x1t,x2t)+lnf(y2t|y1t,x1t,x2t).t=1t=1Withnorestrictionsonanyoftheparameters,wecanmaximizethislog-likelihoodbymaximizingitspartsseparately.Therearetwomultivariateregressionsystemsdefinedbythetwoparts,andtheyhavenoparametersincommon.Because21,22,21,and22areallfree,unrestrictedparameters,therearenorestrictionsimposedon21,22,,or22.Therefore,ineachcase,theefficientestimatorsareequation-by-equationordinaryleastsquares.Thefirstpartproducesestimatesof11,22,and11directly.Fromthesecond,wewouldobtainestimatesof21,22,,and22.Butitis\nGreene-50240bookJune19,200210:4CHAPTER14✦SystemsofRegressionEquations359easytoseeintherelationshipsabovehowtheoriginalparameterscanbeobtainedfromthesemixtures:21=21+11,22=22+12,21=11,=+.222211Becauseoftheinvarianceofmaximumlikelihoodestimatorstotransformation,thesederivedestimatorsoftheoriginalparametersarealsomaximumlikelihoodestimators.Thus,theresultwehaveuptothispointisthatbymanipulatingthispairofsetsofordinaryleastsquaresestimators,wecanobtaintheoriginalleastsquares,efficientestimators.Thisresultisnosurprise,ofcourse,sincewehavejustrearrangedtheoriginalsystemandwearejustrearrangingourleastsquaresestimators.Now,considerestimationofthesamesystemsubjecttotherestriction12=0.Thesecondequationsystemisstillcompletelyunrestricted,somaximumlikelihoodesti-matesofitsparameters,21,22(whichnowequals22),,and22,arestillobtainedbyequation-by-equationleastsquares.Theequationsystemshavenoparametersincommon,somaximumlikelihoodestimatorsofthefirstsetofparametersareobtainedbymaximizingthefirstpartofthelog-likelihood,onceagain,byequation-by-equationordinaryleastsquares.Thus,ourfirstresultisestablished.Toestablishthesecondresult,wemustobtainthetwopartsofthelog-likelihood.Thelog-likelihoodfunctionforthismodelisgivenin(14-20).Sinceeachofthetwosetsofequationsisestimatedbyleastsquares,ineachcase(nullandalternative),foreachpart,theterminthelog-likelihoodistheconcentratedlog-likelihoodgivenin(14-22),whereWjjis(1/T)timesthema-trixofsumsofsquaresandcrossproductsofleastsquaresresiduals.Thesecondsetofequationsisestimatedbyregressionsonx1,x2,andy1withorwithouttherestriction12=0.So,thesecondpartofthelog-likelihoodisalwaysthesame,TlnL2c=−[M2(1+ln2π)+ln|W22|].2Theconcentratedlog-likelihoodforthefirstsetofequationsequalsTlnL1c=−[M1(1+ln2π)+ln|W11|],2whenx2isincludedintheequations,andthesamewithW11(12=0)whenx2isex-cluded.Atthemaximumlikelihoodestimators,thelog-likelihoodforthewholesystemislnLc=lnL1c+lnL2c.Thelikelihoodratiostatisticisλ=−2[(lnLc|12=0)−(lnLc)]=T[ln|W11(12=0)|−ln|W11|].Thisestablishesoursecondresult,sinceW11isbasedonlyonthefirstsetofequations.TheblockofzeroscasewasanalyzedbyGoldberger(1970).Manyregressionsys-temsinwhichtheresultmighthaveproveduseful(e.g.,systemsofdemandequations)\nGreene-50240bookJune19,200210:4360CHAPTER14✦SystemsofRegressionEquationsimposedcross-equationequality(symmetry)restrictions,sotheresultoftheanalysiswasoftenderailed.Goldberger’sresult,however,ispreciselywhatisneededinthemorerecentapplicationoftestingforGrangercausalityinthecontextofvectorautoregres-sions.WewillreturntotheissueinSection19.6.5.14.2.7AUTOCORRELATIONANDHETEROSCEDASTICITYTheseeminglyunrelatedregressionsmodelcanbeextendedtoallowforautocorrelationinthesamefashionasinSection13.9.5.Toreiterate,supposethatyi=Xiβi+εi,εit=ρiεi,t−1+uit,whereuitisuncorrelatedacrossobservations.Thisextensionwillimplythattheblocksinin(14-3),insteadofσijI,areσijij,whereijisgivenin(13-63).ThetreatmentdevelopedbyParks(1967)istheoneweusedearlier.19Itcallsforathree-stepapproach:1.Estimateeachequationinthesystembyordinaryleastsquares.Computeanyconsistentestimatorsofρ.Foreachequation,transformthedatabythePrais–Winstentransformationtoremovetheautocorrelation.20Notethattherewillnotbeaconstantterminthetransformeddatabecausetherewillbeacolumnwith(1−r2)1/2asthefirstobservationand(1−r)fortheremainder.ii2.Usingthetransformeddata,useordinaryleastsquaresagaintoestimate.3.UseFGLSbasedontheestimatedandthetransformeddata.Thereisnobenefittoiteration.Theestimatorisefficientateverystep,anditerationdoesnotproduceamaximumlikelihoodestimatorbecauseoftheJacobiantermintheloglikelihood[see(12-30)].Afterthelaststep,shouldbereestimatedwiththeGLSestimates.Theestimatedcovariancematrixforεcanthenbereconstructedusingσˆmnσˆmn(ε)=.1−rmrnAsinthesingleequationcase,opinionsdifferontheappropriatenessofsuchcor-rectionsforautocorrelation.AtoneextremeisMizon(1995)whoarguesforcefullythatautocorrelationarisesasaconsequenceofaremediablefailuretoincludedynamiceffectsinthemodel.However,inasystemofequations,theanalysisthatleadstothis19GuilkeyandSchmidt(1973),Guilkey(1974)andBerndtandSavin(1977)presentanalternativetreatmentbasedonεt=Rεt−1+ut,whereεtistheM×1vectorofdisturbancesattimetandRisacorrelationmatrix.ExtensionsandadditionalresultsappearinMoschinoandMoro(1994),McLaren(1996),andHolt(1998).20Thereisacomplicationwiththefirstobservationthatisnottreatedquitecorrectlybythisprocedure.Fordetails,seeJudgeetal.(1985,pp.486–489).Thestrictlycorrect(andquitecumbersome)resultsareforthetrueGLSestimator,whichassumesaknown.Itisunlikelythatinafinitesample,anythingislostbyusingthePrais–Winstenprocedurewiththeestimated.OnesuggestionhasbeentousetheCochrane–Orcuttprocedureanddropthefirstobservation.Butinasmallsample,thecostofdiscardingthefirstobservationisalmostsurelygreaterthanthatofneglectingtoaccountproperlyforthecorrelationofthefirstdisturbancewiththeotherfirstdisturbances.\nGreene-50240bookJune19,200210:4CHAPTER14✦SystemsofRegressionEquations361TABLE14.4AutocorrelationCoefficientsGMCHGEWEUSDurbin–Watson0.93751.9841.07211.4130.9091Autocorrelation0.5310.0080.4630.2940.545ResidualCovarianceMatrix[ˆσij/(1−rirj)]GM6679.5CH−220.97151.96GE483.7943.7891684.59WE88.37319.964190.3792.788US−1381.6342.891484.10676.888638.1ParameterEstimates(StandardErrorsinParentheses)β1−51.337−0.4536−24.9134.709114.0207(80.62)(11.86)(25.67)(6.510)(96.49)β20.0940380.068470.042710.050910.16415(0.01733)(0.0174)(0.01134)(0.01060)(0.0386)β30.0407230.320410.109540.042840.2006(0.04216)(0.0258)(0.03012)(0.04127)(0.1428)conclusionisgoingtobefarmorecomplexthaninasingleequationmodel.21Sufficetosay,theissueremainstobesettledconclusively.Example14.3AutocorrelationinaSURModelTable14.4presentstheautocorrelation-correctedestimatesofthemodelofExample14.2.TheDurbin–Watsonstatisticsforthefivedatasetsgivenhere,withtheexceptionofChrysler,stronglysuggestthatthereis,indeed,autocorrelationinthedisturbances.Thedifferencesbetweentheseandtheuncorrectedestimatesgivenearlieraresometimesrelativelylarge,asmightbeexpected,giventhefairlyhighautocorrelationandsmallsamplesize.ThesmallerdiagonalelementsinthedisturbancecovariancematrixcomparedwiththoseofExample14.2reflecttheimprovedfitbroughtaboutbyintroducingthelaggedvariablesintotheequation.Inprinciple,theSURmodelcanaccommodateheteroscedasticityaswellasau-tocorrelation.BartelsandFeibig(1991)suggestedthegeneralizedSURmodel,=A[⊗I]AwhereAisablockdiagonalmatrix.Ideally,Aismadeafunctionofmea-suredcharacteristicsoftheindividualandaseparateparametervector,θ,sothatthemodelcanbeestimatedinstages.Inafirststep,OLSresidualscouldbeusedtoformapreliminaryestimatorofθ,thenthedataaretransformedtohomoscedasticity,leavingandβtobeestimatedatsubsequentstepsusingtransformeddata.Oneapplica-tionalongtheselinesistherandomparametersmodelofFeibig,BartelsandAigner(1991)—(13-46)showshowtherandomparametersmodelinducesheteroscedastic-ity.AnotherapplicationisMandyandMartins–Filho,whospecifiedσ(t)=αz(t).ijijij(Thelinearspecificationofavariancedoespresentsomeproblems,asanegativevalueisnotprecluded.)KumbhakarandHeshmati(1996)proposedacostanddemand21DynamicSURmodelsinthespiritofMizon’sadmonitionwereproposedbyAndersonandBlundell(1982).AfewrecentapplicationsareKiviet,Phillips,andSchipp(1995)andDeschamps(1998).However,relativelylittleworkhasbeendonewithdynamicSURmodels.TheVARmodelsinChapter20areanimportantgroupofapplications,buttheycomefromadifferentanalyticalframework.\nGreene-50240bookJune19,200210:4362CHAPTER14✦SystemsofRegressionEquationssystemthatcombinedthetranslogmodelofSection14.3.2withthecompleteequationsystemin14.3.1.Intheirapplication,onlythecostequationwasspecifiedtoincludeaheteroscedasticdisturbance.14.3SYSTEMSOFDEMANDEQUATIONS:SINGULARSYSTEMSMostoftherecentapplicationsofthemultivariateregressionmodel22havebeeninthecontextofsystemsofdemandequations,eithercommoditydemandsorfactordemandsinstudiesofproduction.Example14.4Stone’sExpenditureSystemStone’sexpendituresystem23basedonasetoflogarithmiccommoditydemandequations,incomeY,andcommoditypricespnisMY∗pjlogqi=αi+ηilog+ηijlog,PPj=1wherePisageneralized(share-weighted)priceindex,ηisanincomeelasticity,andη∗iijisacompensatedpriceelasticity.Wecaninterpretthissystemasthedemandequationinrealexpenditureandrealprices.Theresultingsetofequationsconstitutesaneconometricmodelintheformofasetofseeminglyunrelatedregressions.Inestimation,wemustaccountforanumberofrestrictionsincludinghomogeneityofdegreeoneinincome,iηi=1,andsymmetryofthematrixofcompensatedpriceelasticities,η∗=η∗.ijjiOtherexamplesincludethesystemoffactordemandsandfactorcostsharesfromproduction,whichweshallconsideragainlater.Inprinciple,eachismerelyaparticularapplicationofthemodeloftheprevioussection.Butsomespecialproblemsariseinthesesettings.First,theparametersofthesystemsaregenerallyconstrainedacrossequations.Thatis,theunconstrainedmodelisinconsistentwiththeunderlyingtheory.24Thenumerousconstraintsinthesystemofdemandequationspresentedearliergiveanexample.Asecondintrinsicfeatureofmanyofthesemodelsisthatthedisturbancecovariancematrixissingular.22Notethedistinctionbetweenthemultivariateormultiple-equationmodeldiscussedhereandthemultipleregressionmodel.23AveryreadablesurveyoftheestimationofsystemsofcommoditydemandsisDeatonandMuellbauer(1980).TheexamplediscussedhereistakenfromtheirChapter3andthereferencestoStone’s(1954a,b)workcitedtherein.AcounterpartforproductionfunctionmodelingisChambers(1988).RecentdevelopmentsinthespecificationofsystemsofdemandequationsincludeChavezandSegerson(1987),BrownandWalker(1995),andFry,Fry,andMcLaren(1996).24Thisinconsistencydoesnotimplythatthetheoreticalrestrictionsarenottestableorthattheunrestrictedmodelcannotbeestimated.Sometimes,themeaningofthemodelisambiguouswithouttherestrictions,however.Statisticallyrejectingtherestrictionsimpliedbythetheory,whichwereusedtoderivetheecono-metricmodelinthefirstplace,canputusinaratheruncomfortableposition.Forexample,inastudyofutilityfunctions,Christensen,Jorgenson,andLau(1975),afterrejectingthecross-equationsymmetryofasetofcommoditydemands,stated,“Withthisconclusionwecanterminatethetestsequence,sincetheseresultsinvalidatethetheoryofdemand”(p.380).SeeSilverandAli(1989)fordiscussionoftestingsymmetryrestrictions.\nGreene-50240bookJune19,200210:4CHAPTER14✦SystemsofRegressionEquations36314.3.1COBB–DOUGLASCOSTFUNCTION(EXAMPLE7.3CONTINUED)ConsideraCobb–Douglasproductionfunction,"MY=αxαi.0ii=1ProfitmaximizationwithanexogenouslydeterminedoutputpricecallsforthefirmtomaximizeoutputforagivencostlevelC(orminimizecostsforagivenoutputY).TheLagrangeanforthemaximizationproblemis"M=αxαi+λ(C−px),0ii=1wherepisthevectorofMfactorprices.Thenecessaryconditionsformaximizingthisfunctionare∂αiY∂=−λpi=0and=C−px=0.∂xixi∂λThejointsolutionprovidesxi(Y,p)andλ(Y,p).ThetotalcostofproductionisMMαYipixi=.λi=1i=1ThecostshareallocatedtotheithfactorispixiαiM=M=βi.(14-29)i=1pixii=1αiThefullmodelis25MlnC=β0+βylnY+βilnpi+εc,i=1(14-30)si=βi+εi,i=1,...,M.MMByconstruction,i=1βi=1andi=1si=1.(ThisisthecostfunctionanalysisbeguninExample7.3.Wewillreturntothatapplicationbelow.)ThecostshareswillalsoMsumidenticallytooneinthedata.Itthereforefollowsthati=1εi=0ateverydatapoint,sothesystemissingular.Forthemoment,ignorethecostfunction.LettheM×1disturbancevectorfromthesharesbeε=[ε,ε,...,ε].Sinceεi=0,whereiisa12Mcolumnof1s,itfollowsthatE[εεi]=i=0,whichimpliesthatissingular.Therefore,themethodsoftheprevioussectionscannotbeusedhere.(YoushouldverifythatthesamplecovariancematrixoftheOLSresidualswillalsobesingular.)Thesolutiontothesingularityproblemappearstobetodroponeoftheequations,estimatetheremainder,andsolveforthelastparameterfromtheotherM−1.TheMconstrainti=1βi=1statesthatthecostfunctionmustbehomogeneousofdegreeone25Weleaveasanexercisethederivationofβ0,whichisamixtureofalltheparameters,andβy,whichequals1/mαm.\nGreene-50240bookJune19,200210:4364CHAPTER14✦SystemsofRegressionEquationsintheprices,atheoreticalnecessity.IfweimposetheconstraintβM=1−β1−β2−···−βM−1,(14-31)thenthesystemisreducedtoanonsingularone:M−1Cpilog=β0+βylogY+βilog+εc,pMpMi=1si=βi+εi,i=1,...,M−1Thissystemprovidesestimatesofβ0,βy,andβ1,...,βM−1.Thelastparameterisesti-matedusing(14-31).Inprinciple,itisimmaterialwhichfactorischosenasthenumeraire.Unfortunately,theFGLSparameterestimatesinthenownonsingularsystemwillde-pendonwhichoneischosen.InvarianceisachievedbyusingmaximumlikelihoodestimatesinsteadofFGLS,26whichcanbeobtainedbyiteratingFGLSorbydirectmaximumlikelihoodestimation.27Nerlove’s(1963)studyoftheelectricpowerindustrythatweexaminedinExam-ple7.3providesanapplicationoftheCobb–Douglascostfunctionmodel.HisordinaryleastsquaresestimatesoftheparameterswerelistedinExample7.3.Amongtheresultsare(unfortunately)anegativecapitalcoefficientinthreeofthesixregressions.NerlovealsofoundthatthesimpleCobb–Douglasmodeldidnotadequatelyaccountfortherelationshipbetweenoutputandaveragecost.ChristensenandGreene(1976)furtheranalyzedtheNerlovedataandaugmentedthedatasetwithcostsharedatatoestimatethecompletedemandsystem.AppendixTableF14.2listsNerlove’s145observationswithChristensenandGreene’scostsharedata.Costisthetotalcostofgenerationinmillionsofdollars,outputisinmillionsofkilowatt-hours,thecapitalpriceisanindexofconstructioncosts,thewagerateisindollarsperhourforproductionandmaintenance,thefuelpriceisanindexofthecostperBtuoffuelpurchasedbythefirms,andthedatareflectthe1955costsofproduction.TheregressionestimatesaregiveninTable14.5.LeastsquaresestimatesoftheCobb–Douglascostfunctionaregiveninthefirstcolumn.28Thecoefficientoncapitalisnegative.Becauseβ=β∂lnY/∂lnx—thatis,iyiapositivemultipleoftheoutputelasticityoftheithfactor—thisfindingistroubling.Thethirdcolumngivesthemaximumlikelihoodestimatesobtainedintheconstrainedsystem.Twothingstonotearethedramaticallysmallerstandarderrorsandthenowpositive(andreasonable)estimateofthecapitalcoefficient.TheestimatesofeconomiesofscaleinthebasicCobb–Douglasmodelare1/βy=1.39(column1)and1.25(col-umn3),whichsuggestsomeincreasingreturnstoscale.Nerlove,however,hadfoundevidencethatatextremelylargefirmsizes,economiesofscalediminishedandeven-tuallydisappeared.Toaccountforthis(essentiallyaclassicalU-shapedaveragecostcurve),heappendedaquadraticterminlogoutputinthecostfunction.Thesingleequationandmaximumlikelihoodmultivariateregressionestimatesaregiveninthesecondandfourthsetsofresults.26TheinvarianceresultisprovedinBarten(1969).27SomeadditionalresultsonthemethodaregivenbyRevankar(1976).28ResultsbasedonNerlove’sfulldatasetaregiveninExample7.3.WehaverecomputedthevaluesgiveninTable14.5.NotethatNerloveusedbase10logswhilewehaveusednaturallogsinourcomputations.\nGreene-50240bookJune19,200210:4CHAPTER14✦SystemsofRegressionEquations365TABLE14.5RegressionEstimates(StandardErrorsinParentheses)OrdinaryLeastSquaresMultivariateRegressionβ0−4.686(0.885)−3.764(0.702)−7.281(0.104)−5.962(0.161)βq0.721(0.0174)0.153(0.0618)0.798(0.0147)0.303(0.0570)βqq—0.0505(0.00536)—0.0414(0.00493)βk−0.00847(0.191)0.0739(0.150)0.424(0.00945)0.424(0.00943)β10.594(0.205)0.481(0.161)0.106(0.00380)0.106(0.00380)βf0.414(0.0989)0.445(0.0777)0.470(0.0100)0.470(0.0100)R20.95160.9581——Log|W|——−12.6726−13.02248EstimatedAverageCostFunction1.5FittedActual1.2.9UnitCost.6.3.005000100001500020000MWHFIGURE14.3PredictedandActualAverageCosts.ThequadraticoutputtermgivesthecostfunctiontheexpectedU-shape.Wecandeterminethepointwhereaveragecostreachesitsminimumbyequating∂lnC/∂lnqto1.Thisisq∗=exp[(1−β)/(2β)].Forthemultivariateregression,thisvalueisqqqq∗=4527.About85percentofthefirmsinthesamplehadoutputlessthanthis,sobytheseestimates,mostfirmsinthesamplehadnotyetexhaustedtheavailableeconomiesofscale.Figure14.3showspredictedandactualaveragecostsforthesample.(Inordertoobtainareasonablescale,thesmallestonethirdofthefirmsareomittedfromthefigure.Predictedaveragecostsarecomputedatthesampleaveragesoftheinputprices.Thefiguredoesrevealthatthatbeyondaquitesmallscale,theeconomiesofscale,whileperhapsstatisticallysignificant,areeconomicallyquitesmall.\nGreene-50240bookJune19,200210:4366CHAPTER14✦SystemsofRegressionEquations14.3.2FLEXIBLEFUNCTIONALFORMS:THETRANSLOGCOSTFUNCTIONTheliteraturesonproductionandcostandonutilityanddemandhaveevolvedinseveraldirections.Intheareaofmodelsofproducerbehavior,theclassicpaperbyArrowetal.(1961)calledintoquestiontheinherentrestrictionoftheCobb–Douglasmodelthatallelasticitiesoffactorsubstitutionareequalto1.Researchershavesincedevelopednumerousflexiblefunctionsthatallowsubstitutiontobeunrestricted(i.e.,notevenconstant).29Similarstrandsofliteraturehaveappearedintheanalysisofcommoditydemands.30Inthissection,weexamineindetailamodelofproduction.Supposethatproductionischaracterizedbyaproductionfunction,Y=f(x).Thesolutiontotheproblemofminimizingthecostofproducingaspecifiedoutputrategivenasetoffactorpricesproducesthecost-minimizingsetoffactordemandsxi=xi(Y,p).Thetotalcostofproductionisgivenbythecostfunction,MC=pixi(Y,p)=C(Y,p).(14-32)i=1Ifthereareconstantreturnstoscale,thenitcanbeshownthatC=Yc(p)orC/Y=c(p),wherec(p)istheunitoraveragecostfunction.31Thecost-minimizingfactordemandsareobtainedbyapplyingShephard’s(1970)lemma,whichstatesthatifC(Y,p)givestheminimumtotalcostofproduction,thenthecost-minimizingsetoffactordemandsisgivenby∂C(Y,p)Y∂c(p)x∗==.(14-33)i∂pi∂piAlternatively,bydifferentiatinglogarithmically,weobtainthecost-minimizingfactorcostshares:∂logC(Y,p)pixisi==.(14-34)∂logpiCWithconstantreturnstoscale,lnC(Y,p)=logY+logc(p),so∂logc(p)si=.(14-35)∂logpi29See,inparticular,BerndtandChristensen(1973).TwousefulsurveysofthetopicareJorgenson(1983)andDiewert(1974).30See,forexample,Christensen,Jorgenson,andLau(1975)andtwosurveys,DeatonandMuellbauer(1980)andDeaton(1983).Berndt(1990)containsmanyusefulresults.31TheCobb–Douglasfunctionoftheprevioussectiongivesanillustration.Therestrictionofconstantreturnstoscaleisβy=1,whichisequivalenttoC=Yc(p).Nerlove’smoregeneralversionofthecostfunctionallowsnonconstantreturnstoscale.SeeChristensenandGreene(1976)andDiewert(1974)forsomeoftheformalitiesofthecostfunctionanditsrelationshiptothestructureofproduction.\nGreene-50240bookJune19,200210:4CHAPTER14✦SystemsofRegressionEquations367Inmanyempiricalstudies,theobjectsofestimationaretheelasticitiesoffactorsubsti-tutionandtheownpriceelasticitiesofdemand,whicharegivenbyc(∂2c/∂p∂p)ijθij=(∂c/∂pi)(∂c/∂pj)andηii=siθii.Bysuitablyparameterizingthecostfunction(14-32)andthecostshares(14-33),weobtainanMorM+1equationeconometricmodelthatcanbeusedtoestimatethesequantities.32Thetranscendentallogarithmic,ortranslog,functionisthemostfrequentlyusedflexiblefunctioninempiricalwork.33Byexpandinglogc(p)inasecond-orderTaylorseriesaboutthepointlogp=0,weobtainMMM2∂logc1∂logclogc≈β0+logpi+logpilogpj,∂logpi2∂logpi∂logpji=1i=1j=1(14-36)whereallderivativesareevaluatedattheexpansionpoint.Ifweidentifythesederiva-tivesascoefficientsandimposethesymmetryofthecross-pricederivatives,thenthecostfunctionbecomeslogc=β+βlogp+···+βlogp+δ1log2p+δlogplogp011MM11211212+δ1log2p+···+δ1log2p.(14-37)2222MM2MThisisthetranslogcostfunction.Ifδijequalszero,thenitreducestotheCobb–Douglasfunctionwelookedatearlier.Thecostsharesaregivenby∂logcs1==β1+δ11logp1+δ12logp2+···+δ1MlogpM,∂logp1∂logcs2==β2+δ12logp1+δ22logp2+···+δ2MlogpM,∂logp2(14-38)...∂logcsM==βM+δ1Mlogp1+δ2Mlogp2+···+δMMlogpM.∂logpM32Thecostfunctionisonlyoneofseveralapproachestothisstudy.SeeJorgenson(1983)foradiscussion.33SeeExample2.4.ThefunctionwasdevelopedbyKmenta(1967)asameansofapproximatingtheCESproductionfunctionandwasintroducedformallyinaseriesofpapersbyBerndt,Christensen,Jorgenson,andLau,includingBerndtandChristensen(1973)andChristensenetal.(1975).Theliteraturehasproducedsomethingofacompetitioninthedevelopmentofexoticfunctionalforms.Thetranslogfunctionhasremainedthemostpopular,however,andbyoneaccount,Guilkey,Lovell,andSickles(1983)isthemostreliableofseveralavailablealternatives.SeealsoExample6.2.\nGreene-50240bookJune19,200210:4368CHAPTER14✦SystemsofRegressionEquationsThecostsharesmustsumto1,whichrequires,inadditiontothesymmetryrestrictionsalreadyimposed,β1+β2+···+βM=1,Mδij=0(columnsumsequalzero),(14-39)i=1Mδij=0(rowsumsequalzero).j=1Thesystemofshareequationsprovidesaseeminglyunrelatedregressionsmodelthatcanbeusedtoestimatetheparametersofthemodel.34Tomakethemodelopera-tional,wemustimposetherestrictionsin(14-39)andsolvetheproblemofsingularityofthedisturbancecovariancematrixoftheshareequations.ThefirstisaccomplishedbydividingthefirstM−1pricesbytheMth,thuseliminatingthelasttermineachrowandcolumnoftheparametermatrix.AsintheCobb–Douglasmodel,weobtainanon-singularsystembydroppingtheMthshareequation.Wecomputemaximumlikelihoodestimatesoftheparameterstoensureinvariancewithrespecttothechoiceofwhichshareequationwedrop.Forthetranslogcostfunction,theelasticitiesofsubstitutionareparticularlysimpletocomputeoncetheparametershavebeenestimated:δij+sisjδii+si(si−1)θij=,θii=2.(14-40)sisjsiTheseelasticitieswilldifferateverydatapoint.Itiscommontocomputethematsomecentralpointsuchasthemeansofthedata.35Example14.5ACostFunctionforU.S.ManufacturingAnumberofrecentstudiesusingthetranslogmethodologyhaveusedafour-factormodel,withcapitalK,laborL,energyE,andmaterialsM,thefactorsofproduction.AmongthefirststudiestoemploythismethodologywasBerndtandWood’s(1975)estimationofatranslogcostfunctionfortheU.S.manufacturingsector.ThethreefactorsharesusedtoestimatethemodelarepKpLpEsK=βK+δKKlog+δKLlog+δKElog,pMpMpMpKpLpEsL=βL+δKLlog+δLLlog+δLElog,pMpMpMpKpLpEsE=βE+δKElog+δLElog+δEElog.pMpMpM34Thecostfunctionmaybeincluded,ifdesired,whichwillprovideanestimateofβ0butisotherwiseinessential.Absenttheassumptionofconstantreturnstoscale,however,thecostfunctionwillcontainparametersofinterestthatdonotappearintheshareequations.Assuch,onewouldwanttoincludeitinthemodel.SeeChristensenandGreene(1976)foranexample.35Theywillalsobehighlynonlinearfunctionsoftheparametersandthedata.AmethodofcomputingasymptoticstandarderrorsfortheestimatedelasticitiesispresentedinAndersonandThursby(1986).\nGreene-50240bookJune19,200210:4CHAPTER14✦SystemsofRegressionEquations369TABLE14.6ParameterEstimates(StandardErrorsinParentheses)βK0.05690(0.00134)δKM−0.0189(0.00971)βL0.2534(0.00210)δLL0.07542(0.00676)βE0.0444(0.00085)δLE−0.00476(0.00234)βM0.6542(0.00330)δLM−0.07061(0.01059)δKK0.02951(0.00580)δEE0.01838(0.00499)δKL−0.000055(0.00385)δEM−0.00299(0.00799)δKE−0.01066(0.00339)δMM0.09237(0.02247)TABLE14.7EstimatedElasticitiesCapitalLaborEnergyMaterialsCostSharesfor1959Fittedshare0.056430.274510.043910.62515Actualshare0.061850.273030.045630.61948ImpliedElasticitiesofSubstitutionCapital−7.783Labor0.9908−1.643Energy−3.2300.6021−12.19Materials0.45810.58960.8834−0.3623ImpliedOwnPriceElasticities(smθmm)−0.4392−0.4510−0.5353−0.2265BerndtandWood’sdataarereproducedinAppendixTableF14.1.Maximumlikelihoodesti-matesofthefullsetofparametersaregiveninTable14.6.36Theimpliedestimatesoftheelasticitiesofsubstitutionanddemandfor1959(thecentralyearinthedata)arederivedinTable14.7usingthefittedcostshares.ThedeparturefromtheCobb–Douglasmodelwithunitelasticitiesissubstantial.Forexample,theresultssuggestalmostnosubstitutabilitybetweenenergyandlabor37andsomecomplementaritybetweencapitalandenergy.14.4NONLINEARSYSTEMSANDGMMESTIMATIONWenowconsiderestimationofnonlinearsystemsofequations.Theunderlyingtheoryisessentiallythesameasthatforlinearsystems.Webrieflyconsidertwocasesinthissection,maximumlikelihood(orFGLS)estimationandGMMestimation.Sincethe36TheseestimatesarenotthesameasthosereportedbyBerndtandWood.Topurgetheirdataofpossiblecorrelationwiththedisturbances,theyfirstregressedthepriceson10exogenousmacroeconomicvariables,suchasU.S.population,governmentpurchasesoflaborservices,realexportsofdurablegoods,andU.S.tangiblecapitalstock,andthenbasedtheiranalysisonthefittedvalues.Theestimatesgivenhereare,ingeneral,quiteclosetothosegivenbyBerndtandWood.Forexample,theirestimatesofthefirstfiveparametersare0.0564,0.2539,0.0442,0.6455,and0.0254.37BerndtandWood’sestimateofθELfor1959is0.64.\nGreene-50240bookJune19,200210:4370CHAPTER14✦SystemsofRegressionEquationstheoryisessentiallythatofSection14.2.4,mostofthefollowingwilldescribepracticalaspectsofestimation.Considerestimationoftheparametersoftheequationsystemy1=h1(β,X)+ε1,y2=h2(β,X)+ε2,.(14-41)..yM=hM(β,X)+εM.ThereareMequationsintotal,tobeestimatedwitht=1,...,Tobservations.ThereareKparametersinthemodel.Noassumptionismadethateachequationhas“itsown”parametervector;wesimplyusesomeoforalltheKelementsinβineachequation.Likewise,thereisasetofTobservationsoneachofPindependentvariablesxp,p=1,...,P,someoforallthatappearineachequation.Forconvenience,theequationsarewrittengenericallyintermsofthefullβandX.Thedisturbancesareassumedtohavezeromeansandcontemporaneouscovariancematrix.Wewillleavetheextensiontoautocorrelationformoreadvancedtreatments.14.4.1GLSESTIMATIONInthemultivariateregressionmodel,ifisknown,thenthegeneralizedleastsquaresestimatorofβisthevectorthatminimizesthegeneralizedsumofsquaresMMε(β)−1ε(β)=σij[y−h(β,X)][y−h(β,X)],(14-42)iijji=1j=1whereε(β)isanMT×1vectorofdisturbancesobtainedbystackingtheequationsand=⊗I.[See(14-3).]AswedidinChapter9,definethepseudoregressorsasthederivativesoftheh(β,X)functionswithrespecttoβ.Thatis,linearizeeachoftheequations.Thenthefirst-orderconditionforminimizingthissumofsquaresis∂ε(β)−1ε(β)MM=σij2X0(β)ε(β)=0,(14-43)ij∂βi=1j=1whereσijistheijthelementof−1andX0(β)isaT×Kmatrixofpseudoregressorsifromthelinearizationoftheithequation.(SeeSection9.2.3.)Ifanyoftheparametersinβdonotappearintheithequation,thenthecorrespondingcolumnofX0(β)willbeiacolumnofzeros.Thisproblemofestimationisdoublycomplex.Inalmostanycircumstance,solutionwillrequireaniterationusingoneofthemethodsdiscussedinAppendixE.Second,ofcourse,isthatisnotknownandmustbeestimated.Rememberthatefficientestimationinthemultivariateregressionmodeldoesnotrequireanefficientestimatorof,onlyaconsistentone.Therefore,oneapproachwouldbetoestimatetheparametersofeachequationseparatelyusingnonlinearleastsquares.Thismethodwillbeinefficientifanyoftheequationsshareparameters,sincethatinformationwillbeignored.Butatthisstep,consistencyistheobjective,notefficiency.Theresultingresidualscanthenbeused\nGreene-50240bookJune19,200210:4CHAPTER14✦SystemsofRegressionEquations371tocompute1S=EE.(14-44)TThesecondstepofFGLSisthesolutionof(14-43),whichwillrequireaniterativeprocedureonceagainandcanbebasedonSinsteadof.Withwell-behavedpseudore-gressors,thissecond-stepestimatorisfullyefficient.Onceagain,thesametheoryusedforFGLSinthelinear,single-equationcaseapplieshere.38OncetheFGLSestimatorisobtained,theappropriateasymptoticcovariancematrixisestimatedwith−1MMEst.Asy.Var[βˆ]=sijX0(β)X0(β).iji=1j=1Thereisapossibleflawinthestrategyoutlinedabove.Itmaynotbepossibletofitalltheequationsindividuallybynonlinearleastsquares.Itisconceivablethatidentificationofsomeoftheparametersrequiresjointestimationofmorethanoneequation.Butaslongasthefullsystemidentifiesallparameters,thereisasimplewayoutofthisproblem.Recallthatallweneedforourfirststepisaconsistentsetofestimatorsoftheelementsofβ.ItiseasytoshowthattheprecedingdefinesaGMMestimator(seeChapter18.)Wecanusethisresulttodeviseanalternative,simplestrategy.Theweightingofthesumsofsquaresandcrossproductsin(14-42)byσijproducesanefficientestimatorofβ.AnyotherweightingbasedonsomepositivedefiniteAwouldproduceconsistent,althoughinefficient,estimates.Atthisstep,though,efficiencyissecondary,sothechoiceofA=Iisaconvenientcandidate.Thus,forourfirststep,wecanfindβtominimizeMMTε(β)ε(β)=[y−h(β,X)][y−h(β,X)]=[y−h(β,x)]2.iiiiitiiti=1i=1t=1(Thisestimatorisjustpoolednonlinearleastsquares,wheretheregressionfunctionvariesacrossthesetsofobservations.)ThisstepwillproducetheβˆweneedtocomputeS.14.4.2MAXIMUMLIKELIHOODESTIMATIONWithnormallydistributeddisturbances,thelog-likelihoodfunctionforthismodelisstillgivenby(14-18).Therefore,estimationofisdoneexactlyasbefore,usingtheSin(14-44).Likewise,theconcentratedlog-likelihoodin(14-22)andthecriterionfunctionin(14-23)areunchanged.Therefore,oneapproachtomaximumlikelihoodestimationisiteratedFGLS,basedontheresultsinSection14.2.3.Thismethodwillrequiretwolevelsofiteration,however,sinceforeachestimated(βl),writtenasafunctionoftheestimatesofβobtainedatiterationl,anonlinear,iterativesolutionisrequiredtoobtainβl+1.TheiterationthenreturnstoS.ConvergenceisbasedeitheronSorβˆ;ifonestabilizes,thentheotherwillalso.TheadvantageofdirectmaximumlikelihoodestimationthatwasdiscussedinSection14.2.4islostherebecauseofthenonlinearityoftheregressions;thereisno38Neitherthenonlinearitynorthemultipleequationaspectofthismodelbringsanynewstatisticalissuestothefore.Bystackingtheequations,weseethatthismodelissimplyavariantofthenonlinearregressionmodelthatwetreatedinChapter9withtheaddedcomplicationofanonscalardisturbancecovariancematrix,whichweanalyzedinChapter10.Thenewcomplicationsareprimarilypractical.\nGreene-50240bookJune19,200210:4372CHAPTER14✦SystemsofRegressionEquationsconvenientarrangementofparametersintoamatrix.Butafewpracticalaspectstoformulatingthecriterionfunctionanditsderivativesthatmaybeusefuldoremain.Estimationofthemodelin(14-41)mightbeslightlymoreconvenientifeachequationdidhaveitsowncoefficientvector.Supposethenthatthereisoneunderlyingparametervectorβandthatweformulateeachequationashit=hi[γi(β),xit]+εit.Thenthederivativesofthelog-likelihoodfunctionarebuiltupfrom∂ln|S(γ)|1TM=d=−sijx0(γ)e(γ),i=1,...,M.(14-45)iitijtj∂γiTt=1j=1Itremainstoimposetheequalityconstraintsthathavebeenbuiltintothemodel.Sinceeachγiisbuiltupjustbyextractingelementsfromβ,therelevantderivativewithrespecttoβisjustasumofthosewithrespecttoγ.∂lnLnKi∂lnLc=c1(γ=β),igk∂βk∂γigi=1g=1where1(γig=βk)equals1ifγigequalsβkand0ifnot.Thisderivativecanbeformulatednfairlysimplyasfollows.ThereareatotalofG=i=1Kiparametersinγ,butonlyK0.Variablesthatarepredeterminedinamodelcanbetreated,atleastasymptotically,asiftheywereexogenousinthesensethatconsistentestimatescanbeobtainedwhentheyappearasregressors.WeusedthisresultinChapters5and12aswell,whenwederivedthepropertiesofregressionscontaininglaggedvaluesofthedependentvariable.ArelatedconceptisGrangercausality.Grangercausality(akindofstatisticalfeed-back)isabsentwhenf(xt|xt−1,yt−1)equalsf(xt|xt−1).Thedefinitionstatesthatintheconditionaldistribution,laggedvaluesofytaddnoinformationtoexplanationofmovementsofxtbeyondthatprovidedbylaggedvaluesofxtitself.Thisconceptisusefulintheconstructionofforecastingmodels.Finally,ifxtisweaklyexogenousandifyt−1doesnotGrangercausext,thenxtisstronglyexogenous.15.2.3AGENERALNOTATIONFORLINEARSIMULTANEOUSEQUATIONSMODELS6Thestructuralformofthemodelis7γ11yt1+γ21yt2+···+γM1ytM+β11xt1+···+βK1xtK=εt1,γ12yt1+γ22yt2+···+γM2ytM+β12xt1+···+βK2xtK=εt2,(15-2)...γ1Myt1+γ2Myt2+···+γMMytM+β1Mxt1+···+βKMxtK=εtM.ThereareMequationsandMendogenousvariables,denotedy1,...,yM.ThereareKexogenousvariables,x1,...,xK,thatmayincludepredeterminedvaluesofy1,...,yMaswell.Thefirstelementofxtwillusuallybetheconstant,1.Finally,εt1,...,εtMarethestructuraldisturbances.Thesubscripttwillbeusedtoindexobservations,t=1,...,T.6Wewillberestrictingourattentiontolinearmodelsinthischapter.Nonlinearsystemsoccupyanotherstrandofliteratureinthisarea.Nonlinearsystemsbringforthnumerouscomplicationsbeyondthosediscussedhereandarebeyondthescopeofthistext.Gallant(1987),GallantandHolly(1980),GallantandWhite(1988),DavidsonandMacKinnon(1993),andWooldridge(2002)providefurtherdiscussion.7Forthepresent,itisconvenienttoignorethespecialnatureoflaggedendogenousvariablesandtreatthemthesameasthestrictlyexogenousvariables.\nGreene-50240bookJune19,200210:10CHAPTER15✦Simultaneous-EquationsModels383Inmatrixterms,thesystemmaybewrittenγ11γ12···γ1Mγ21γ22···γ2M[y1y2···yM]t...γM1γM2···γMMβ11β12···β1Mβ21β22···β2M+[x1x2···xK]t.=[ε1ε2···εM]t..βK1βK2···βKMory+xB=ε.tttEachcolumnoftheparametermatricesisthevectorofcoefficientsinaparticularequation,whereaseachrowappliestoaspecificvariable.TheunderlyingtheorywillimplyanumberofrestrictionsonandB.Oneofthevariablesineachequationislabeledthedependentvariablesothatitscoefficientinthemodelwillbe1.Thus,therewillbeatleastone“1”ineachcolumnof.Thisnormaliza-tionisnotasubstantiverestriction.Therelationshipdefinedforagivenequationwillbeunchangedifeverycoefficientintheequationismultipliedbythesameconstant.Choosinga“dependentvariable”simplyremovesthisindeterminacy.Ifthereareanyidentities,thenthecorrespondingcolumnsofandBwillbecompletelyknown,andtherewillbenodisturbanceforthatequation.Sincenotallvariablesappearinallequa-tions,someoftheparameterswillbezero.Thetheorymayalsoimposeothertypesofrestrictionsontheparametermatrices.Ifisanuppertriangularmatrix,thenthesystemissaidtobetriangular.Inthiscase,themodelisoftheformyt1=f1(xt)+εt1,yt2=f2(yt1,xt)+εt2,...ytM=fM(yt1,yt2,...,yt,M−1,xt)+εtM.Thejointdeterminationofthevariablesinthismodelisrecursive.Thefirstiscom-pletelydeterminedbytheexogenousfactors.Then,giventhefirst,thesecondislikewisedetermined,andsoon.\nGreene-50240bookJune19,200210:10384CHAPTER15✦Simultaneous-EquationsModelsThesolutionofthesystemofequationsdeterminingytintermsofxtandεtisthereducedformofthemodel,π11π12···π1Mπ21π22···π2My=[x1x2···xK]+[ν···ν]tt..1Mt.πK1πK2···πKM=−xB−1+ε−1tt=x+v.ttForthissolutiontoexist,themodelmustsatisfythecompletenessconditionforsimul-taneousequationssystems:mustbenonsingular.Example15.3StructureandReducedFormForthesmallmodelinExample15.1,y=[c,i,y],x=[1,r,g,c,y],and−1−1−α0−β0010−10−β101−β2β21−11=01−1,B=00−1,=α11−α11,−α1β21−α200α1β210β20α0(1−β2+β0α1)α1β1α1α2(1−β2)−β2α11=α0β2+β0(1−α1)β1(1−α1)β2α2β2−β2(1−α1)α0+β0β11α2−β2where=1−α1−β2.Thecompletenessconditionisthatα1andβ2donotsumtoone.ThestructuraldisturbancesareassumedtoberandomlydrawnfromanM-variatedistributionwithE[ε|x]=0andE[εε|x]=.tttttForthepresent,weassumethatE[εε|x,x]=0,∀t,s.tstsLater,wewilldropthisassumptiontoallowforheteroscedasticityandautocorrelation.Itwilloccasionallybeusefultoassumethatεthasamultivariatenormaldistribution,butweshallpostponethisassumptionuntilitbecomesnecessary.Itmaybeconvenienttoretaintheidentitieswithoutdisturbancesasseparateequations.Ifso,thenonewaytoproceedwiththestochasticspecificationistoplacerowsandcolumnsofzerosintheappropriateplacesin.Itfollowsthatthereduced-formdisturbances,v=ε−1havettE[v|x]=(−1)0=0,ttE[vv|x]=(−1)−1=.tttThisimpliesthat=.\nGreene-50240bookJune19,200210:10CHAPTER15✦Simultaneous-EquationsModels385Theprecedingformulationdescribesthemodelasitappliestoanobservation[y,x,ε]tataparticularpointintimeorinacrosssection.Inasampleofdata,eachjointobser-vationwillbeonerowinadatamatrix,yxε111yxε[YXE]=222....yxεTTTIntermsofthefullsetofTobservations,thestructureisY+XB=E,withE[E|X]=0andE[(1/T)EE|X]=.Undergeneralconditions,wecanstrengthenthisstructuretoplim[(1/T)EE]=.Animportantassumption,comparablewiththeonemadeinChapter5fortheclassicalregressionmodel,isplim(1/T)XX=Q,afinitepositivedefinitematrix.(15-3)Wealsoassumethatplim(1/T)XE=0.(15-4)Thisassumptioniswhatdistinguishesthepredeterminedvariablesfromtheendogenousvariables.ThereducedformisY=X+V,whereV=E−1.Combiningtheearlierresults,wehaveYQ+Q1plimX[YXV]=QQ0.(15-5)TV015.3THEPROBLEMOFIDENTIFICATIONSolvingtheproblemtobeconsideredhere,theidentificationproblem,logicallyprecedesestimation.Weaskatthispointwhetherthereisanywaytoobtainestimatesoftheparametersofthemodel.Wehaveinhandacertainamountofinformationuponwhichtobaseanyinferenceaboutitsunderlyingstructure.Ifmorethanonetheoryisconsistentwiththesame“data,”thenthetheoriesaresaidtobeobservationallyequivalentandthereisnowayofdistinguishingthem.Thestructureissaidtobeunidentified.88AusefulsurveyofthisissueisHsiao(1983).\nGreene-50240bookJune19,200210:10386CHAPTER15✦Simultaneous-EquationsModelsPPPS1S2S3S3D3D32D21D1DD21QQQ(a)(b)(c)FIGURE15.1MarketEquilibria.Example15.4ObservationalEquivalence9TheobserveddataconsistofthemarketoutcomesshowninFigure15.1a.Wehavenoknowledgeoftheconditionsofsupplyanddemandbeyondourbeliefthatthedatarepresentequilibria.Unfortunately,parts(b)and(c)ofFigure15.1bothshowstructures—thatis,trueunderlyingsupplyanddemandcurves—whichareconsistentwiththedatainFigure15.1a.WithonlythedatainFigure15.1a,wehavenowayofdeterminingwhichoftheories15.1borcistherightone.Thus,thestructureunderlyingthedatainFigure15.1aisunidentified.Tosuggestwhereourdiscussionisheaded,supposethatweaddtotheprecedingtheknownfactthattheconditionsofsupplywereunchangedduringtheperiodoverwhichthedataweredrawn.Thisrulesout15.1candidentifies15.1basthecorrectstructure.NotehowthisscenariorelatestoExample15.1andtothediscussionfollowingthatexample.Theidentificationproblemisnotoneofsamplingpropertiesorthesizeofthesample.Tofocusideas,itisevenusefultosupposethatwehaveathandaninfinite-sizedsampleofobservationsonthevariablesinthemodel.Now,withthissampleandourpriortheory,whatinformationdowehave?Inthereducedform,y=x+v,E[vv|x]=.ttttttthepredeterminedvariablesareuncorrelatedwiththedisturbances.Thus,wecan“observe”plim(1/T)XX=Q[assumed;see(15-3)],plim(1/T)XY=plim(1/T)X(X+V)=Q,plim(1/T)YY=plim(1/T)(X+V)(X+V)=Q+.Therefore,,thematrixofreduced-formcoefficients,isobservable:−1XXXY=plimplim.TT9ThisexampleparaphrasestheclassicargumentofWorking(1926).\nGreene-50240bookJune19,200210:10CHAPTER15✦Simultaneous-EquationsModels387Thisestimatorissimplytheequation-by-equationleastsquaresregressionofYonX.Sinceisobservable,isalso:
−1YYYXXXXY=plim−plim.TTTTThisresultshouldberecognizedasthematrixofleastsquaresresidualvariancesandcovariances.Therefore,andcanbeestimatedconsistentlybyleastsquaresregressionofYonX.Theinformationinhand,therefore,consistsof,,andwhateverothernonsamplein-formationwehaveaboutthestructure.10Now,canwededucethestructuralparametersfromthereducedform?Thecorrespondencebetweenthestructuralandreduced-formparametersistherelationships=−B−1and=E[vv]=(−1)−1.Ifwereknown,thenwecoulddeduceBas−andas.Itwouldappear,therefore,thatourproblemboilsdowntoobtaining,whichmakessense.Ifwereknown,thenwecouldrewrite(15-2),collectingtheendogenousvariablestimestheirrespectivecoefficientsontheleft-handsideofaregression,andestimatetheremainingunknowncoefficientsonthepredeterminedvariablesbyordinaryleastsquares.11Theidentificationquestionwewillpursuecanbeposedasfollows:Wecan“observe”thereducedform.Wemustdeducethestructurefromwhatweknowaboutthereducedform.Ifthereismorethanonestructurethatcanleadtothesamereducedform,thenwecannotsaythatwecan“estimatethestructure.”Whichstructurewouldthatbe?Supposethatthe“true”structureis[,B,].Nowconsideradifferentstructure,y˜+xB˜=ε˜,thatisobtainedbypostmultiplyingthefirststructurebysomenonsingularmatrixF.Thus,˜=F,B˜=BF,ε˜=εF.Thereducedformthatcorrespondstothisnewstructureis,unfortunately,thesameastheonethatcorrespondstotheoldone;˜=−B˜˜−1=−BFF−1−1=,and,inthesamefashion,˜=.Thefalsestructurelooksjustlikethetrueone,atleastintermsoftheinformationwehave.Statistically,thereisnowaywecantellthemapart.Thestructuresareobservationallyequivalent.SinceFwaschosenarbitrarily,weconcludethatanynonsingulartransformationoftheoriginalstructurehasthesamereducedform.Anyreasonforoptimismthatwemighthavehadshouldbeabandoned.Asthemodelstands,thereisnomeansbywhichthestructuralparameterscanbededucedfromthereducedform.Thepracticalimplicationisthatiftheonlyinformationthatwehaveisthereduced-formparameters,thenthestructuralmodelisnotestimable.Sohowwereweabletoidentifythemodels10Wehavenotnecessarilyshownthatthisisalltheinformationinthesample.Ingeneral,weobservetheconditionaldistributionf(yt|xt),whichconstitutesthelikelihoodforthereducedform.Withnormallydistributeddisturbances,thisdistributionisafunctionof,.(SeeSection15.6.2.)Withotherdistributions,otherorhighermomentsofthevariablesmightprovideadditionalinformation.See,forexample,Goldberger(1964,p.311),Hausman(1983,pp.402–403),andespeciallyRiersøl(1950).11ThismethodispreciselytheapproachoftheLIMLestimator.SeeSection15.5.5.\nGreene-50240bookJune19,200210:10388CHAPTER15✦Simultaneous-EquationsModelsintheearlierexamples?Theanswerisbybringingtobearournonsampleinformation,namelyourtheoreticalrestrictions.Considerthefollowingexamples:Example15.5IdentificationConsideramarketinwhichqisquantityofQ,pisprice,andzisthepriceofZ,arelatedgood.Weassumethatzentersboththesupplyanddemandequations.Forexample,ZmightbeacropthatispurchasedbyconsumersandthatwillbegrownbyfarmersinsteadofQifitspricerisesenoughrelativetop.Thus,wewouldexpectα2>0andβ2<0.So,qd=α0+α1p+α2z+εd(demand),qs=β0+β1p+β2z+εs(supply),qd=qs=q(equilibrium).Thereducedformisα1β0−α0β1α1β2−α2β1α1εs−α2εdq=+z+=π11+π21z+νq,α1−β1α1−β1α1−β1β0−α0β2−α2εs−εdp=+z+=π12+π22z+νp.α1−β1α1−β1α1−β1Withonlyfourreduced-formcoefficientsandsixstructuralparameters,itisobviousthattherewillnotbeacompletesolutionforallsixstructuralparametersintermsofthefourreducedparameters.Suppose,though,thatitisknownthatβ2=0(farmersdonotsubstitutethealternativecropforthisone).Thenthesolutionforβ1isπ21/π22.Afterabitofmanipulation,wealsoobtainβ0=π11−π12π21/π22.Therestrictionidentifiesthesupplyparameters.Butthisstepisasfaraswecango.Now,supposethatincomex,ratherthanz,appearsinthedemandequation.Therevisedmodelisq=α0+α1p+α2x+ε1,q=β0+β1p+β2z+ε2.Thestructureisnow−α0−β011[qp]+[1xz]−α20=[ε1ε2].−α1−β10−β2Thereducedformis(α1β0−α0β1)/(β0−α0)/[qp]=[1xz]−α2β1/−α2/+[ν1ν2],α1β2/β2/where=(α1−β1).Everyfalsestructurehasthesamereducedform.Butinthecoefficientmatrix,α0f11+β0f12α0f12+β0f22B˜=BF=α2f11α2f12,β2f21β2f22iff12isnotzero,thentheimposterwillhaveincomeappearinginthesupplyequation,whichourtheoryhasruledout.Likewise,iff21isnotzero,thenzwillappearinthedemandequation,whichisalsoruledoutbyourtheory.Thus,althoughallfalsestructureshavethe\nGreene-50240bookJune19,200210:10CHAPTER15✦Simultaneous-EquationsModels389samereducedformasthetrueone,theonlyonethatisconsistentwithourtheory(i.e.,isadmissible)andhascoefficientsof1onqinbothequations(examineF)isF=I.Thistransformationjustproducestheoriginalstructure.Theuniquesolutionsforthestructuralparametersintermsofthereduced-formparame-tersareπ31π21α0=π11−π12,β0=π11−π12,π32π22π31π21α1=,β1=,π32π22π21π31π31π21α2=π22−,β2=π32−.π22π32π32π22Theprecedingdiscussionhasconsideredtwoequivalentmethodsofestablishingidentifiability.Ifitispossibletodeducethestructuralparametersfromtheknownreducedformparameters,thenthemodelisidentified.Alternatively,ifitcanbeshownthatnofalsestructureisadmissible—thatis,satisfiesthetheoreticalrestrictions—thenthemodelisidentified.1215.3.1THERANKANDORDERCONDITIONSFORIDENTIFICATIONItisusefultosummarizewhatwehavedeterminedthusfar.Theunknownstructuralparametersconsistof=anM×Mnonsingularmatrix,B=aK×Mparametermatrix,=anM×Msymmetricpositivedefinitematrix.Theknown,reduced-formparametersare=aK×Mreduced-formcoefficientsmatrix,=anM×Mreduced-formcovariancematrix.Simplycountingparametersinthestructureandreducedformsyieldsanexcessofl=M2+KM+1M(M+1)−KM−1M(M+1)=M2,22whichis,asmightbeexpectedfromtheearlierresults,thenumberofunknownelementsin.Withoutfurtherinformation,identificationisclearlyimpossible.Theadditionalinformationcomesinseveralforms.1.Normalizations.Ineachequation,onevariablehasacoefficientof1.Thisnormal-izationisanecessaryscalingoftheequationthatislogicallyequivalenttoputtingonevariableontheleft-handsideofaregression.Forpurposesofidentification(andsomeestimationmethods),thechoiceamongtheendogenousvariablesisarbitrary.Butatthetimethemodelisformulated,eachequationwillusuallyhavesomenaturaldepen-dentvariable.Thenormalizationdoesnotidentifythedependentvariableinanyformalorcausalsense.Forexample,inamodelofsupplyanddemand,boththe“demand”12Forotherinterpretations,seeAmemiya(1985,p.230)andGabrielsen(1978).SomedeepertheoreticalresultsonidentificationofparametersineconometricmodelsaregivenbyBekkerandWansbeek(2001).\nGreene-50240bookJune19,200210:10390CHAPTER15✦Simultaneous-EquationsModelsequation,Q=f(P,x),andthe“inversedemand”equation,P=g(Q,x),areappro-priatespecificationsoftherelationshipbetweenpriceandquantity.Wenote,though,thefollowing:Withthenormalizations,thereareM(M−1),notM2,undeterminedvaluesinandthismanyindeterminaciesinthemodeltoberesolvedthroughnonsampleinformation.2.Identities.Insomemodels,variabledefinitionsorequilibriumconditionsimplythatallthecoefficientsinaparticularequationareknown.Intheprecedingmarketexample,therearethreeequations,butthethirdistheequilibriumconditionQd=Qs.Klein’sModelI(Example15.3)containssixequations,includingtwoaccountingidentitiesandtheequilibriumcondition.Thereisnoquestionofidentificationwithrespecttoidentities.Theymaybecarriedasadditionalequationsinthemodel,aswedowithKlein’sModelIinseverallaterexamples,orbuiltintothemodelapriori,asistypicalinmodelsofsupplyanddemand.Thesubstantivenonsampleinformationthatwillbeusedinidentifyingthemodelwillconsistofthefollowing:3.Exclusions.TheomissionofvariablesfromanequationplaceszerosinBand.InExample15.5,theexclusionofincomefromthesupplyequationservedtoidentifyitsparameters.4.Linearrestrictions.Restrictionsonthestructuralparametersmayalsoservetoruleoutfalsestructures.Forexample,along-standingproblemintheestimationofproductionmodelsusingtime-seriesdataistheinabilitytodisentangletheeffectsofeconomiesofscalefromthoseoftechnologicalchange.Insometreatments,thesolutionistoassumethatthereareconstantreturnstoscale,therebyidentifyingtheeffectsduetotechnologicalchange.5.Restrictionsonthedisturbancecovariancematrix.Intheidentificationofamodel,thesearesimilartorestrictionsontheslopeparameters.Forexample,iftheprevi-ousmarketmodelweretoapplytoamicroeconomicsetting,thenitwouldprobablybereasonabletoassumethatthestructuraldisturbancesinthesesupplyanddemandequationsareuncorrelated.Section15.3.3showsacaseinwhichacovariancerestrictionidentifiesanotherwiseunidentifiedmodel.Toformalizetheidentificationcriteria,werequireanotationforasingleequation.ThecoefficientsofthejthequationarecontainedinthejthcolumnsofandB.Thejthequationisy+xB=ε.(15-6)jjj(Forconvenience,wehavedroppedtheobservationsubscript.)Inthisequation,weknowthat(1)oneoftheelementsinjisoneand(2)somevariablesthatappearelsewhereinthemodelareexcludedfromthisequation.Table15.1definesthenotationusedtoincorporatetheserestrictionsin(15-6).Equationjmaybewritteny=Yγ+Y∗γ∗+xβ+x∗β∗+ε.jjjjjjjjjj\nGreene-50240bookJune19,200210:10CHAPTER15✦Simultaneous-EquationsModels391TABLE15.1ComponentsofEquationj(DependentVariable=y)jEndogenousVariablesExogenousVariablesIncludedYj=Mjvariablesxj=KjvariablesExcludedY∗=M∗variablesx∗=K∗variablesjjjjThenumberofequationsisM+M∗+1=M.jjThenumberofexogenousvariablesisK+K∗=K.jjThecoefficientonyjinequationjis1.*swillalwaysbeassociatedwithexcludedvariables.Theexclusionsimplythatγ∗=0andβ∗=0.Thus,jj=[1−γ0]andB=[−β0].jjjj(Notethesignconvention.)Forthisequation,wepartitionthereduced-formcoefficientmatrixinthesamefashion:(1)(M)(M∗)jj∗∗πjj¯j∗[Kjrows][yjYjYj]=[xjxj]π∗¯∗¯∗+[vjVjVj]∗(15-7)jjj[Kjrows].Thereduced-formcoefficientmatrixis=−B−1,whichimpliesthat=−B.Thejthcolumnofthismatrixequationappliestothejthequation,j=−Bj.InsertingthepartsfromTable15.1yields1πjj¯j−γβj∗¯∗¯∗j=.πjjj00Nowextractthetwosubequations,πj−jγj=βj(Kjequations),(15-8)¯π∗−∗γ=0(K∗equations),(15-9)jjjj(1)(Mj).ThesolutionforBintermsofthatweobservedatthebeginningofthisdiscussionisin(15-8).Equation(15-9)maybewritten∗γ=π∗.(15-10)jjjThissystemisK∗equationsinMunknowns.Iftheycanbesolvedforγ,then(15-jjj8)givesthesolutionforβjandtheequationisidentified.Fortheretobeasolution,\nGreene-50240bookJune19,200210:10392CHAPTER15✦Simultaneous-EquationsModelstheremustbeatleastasmanyequationsasunknowns,whichleadstothefollowingcondition.DEFINITION15.1OrderConditionforIdentificationofEquationjK∗≥M.(15-11)jjThenumberofexogenousvariablesexcludedfromequationjmustbeatleastaslargeasthenumberofendogenousvariablesincludedinequationj.Theorderconditionisonlyacountingrule.Itisanecessarybutnotsufficientconditionforidentification.Itensuresthat(15-10)hasatleastonesolution,butitdoesnotensurethatithasonlyonesolution.Thesufficientconditionforuniquenessfollows.DEFINITION15.2RankConditionforIdentificationrank[π∗,∗]=rank[∗]=M.jjjjThisconditionimposesarestrictiononasubmatrixofthereduced-formcoefficientmatrix.Therankconditionensuresthatthereisexactlyonesolutionforthestructuralparametersgiventhereduced-formparameters.Ouralternativeapproachtotheiden-tificationproblemwastousethepriorrestrictionson[,B]toeliminateallfalsestruc-tures.Anequivalentconditionbasedonthisapproachissimplertoapplyandhasmoreintuitiveappeal.Wefirstrearrangethestructuralcoefficientsinthematrix1A1−γjA2A==0A3=[ajAj].(15-12)B−βjA40A5Thejthcolumninafalsestructure[F,BF](i.e.,theimposterforourequationj)wouldbe[fj,Bfj],wherefjisthejthcolumnofF.Thisnewjthequationistobebuiltupasalinearcombinationoftheoldoneandtheotherequationsinthemodel.Thus,partitioningaspreviously,1A11−γjA2γ˜jf0a˜j=0A3=0.f1β˜−βjA4j0A50\nGreene-50240bookJune19,200210:10CHAPTER15✦Simultaneous-EquationsModels393Ifthishybridistohavethesamevariablesastheoriginal,thenitmusthavenonzeroelementsinthesameplaces,whichcanbeensuredbytakingf0=1,andzerosinthesamepositionsastheoriginalaj.Extractingthethirdandfifthblocksofrows,ifa˜jistobeadmissible,thenitmustmeettherequirementA31f=0.A5Thisequalityisnotpossibleifthe(M∗+K∗)×(M−1)matrixinbracketshasfullcolumnjjrank,sowehavetheequivalentrankcondition,A3rank=M−1.A5Thecorrespondingorderconditionisthatthematrixinbracketsmusthaveatleastasmanyrowsascolumns.Thus,M∗+K∗≥M−1.ButsinceM=M+M∗+1,thisjjjjconditionisthesameastheorderconditionin(15-11).Theequivalenceofthetworankconditionsispursuedintheexercises.Theprecedingprovidesasimplemethodforcheckingtherankandorderconditions.Weneedonlyarrangethestructuralparametersinatableauandexaminetherelevantsubmatricesoneatatime;A3andA5arethestructuralcoefficientsintheotherequationsonthevariablesthatareexcludedfromequationj.Oneruleofthumbissometimesusefulincheckingtherankandorderconditionsofamodel:Ifeveryequationhasitsownpredeterminedvariable,theentiremodelisidentified.Theproofissimpleandisleftasanexercise.Forafinalexample,weconsiderasomewhatlargermodel.Example15.6IdentificationofKlein’sModelIThestructuralcoefficientsinthesixequationsofKlein’sModelI,transposedandmultipliedby−1forconvenience,arelistedinTable15.2.Identificationoftheconsumptionfunctionrequiresthatthematrix[A,A]haverank5.Thecolumnsofthismatrixarecontainedinboxes35inthetable.Noneofthecolumnsindicatedbyarrowscanbeformedaslinearcombinationsoftheothercolumns,sotherankconditionissatisfied.Verificationoftherankandorderconditionsfortheothertwoequationsisleftasanexercise.Itisunusualforamodeltopasstheorderbutnottherankcondition.Generally,eithertheconditionsareobviousorthemodelissolargeandhassomanypredeterminedTABLE15.2Klein’sModelI,StructuralCoefficientsBCIWpXPK1WgGTAPKX−1−1−1C−10α30α10α0α3000α200I0−100β10β00000β2β30Wp00−1γ00γ000γ00γ1032X110−10000100000P00−11−10000−10000K01000−100000010↑↑↑↑↑AA35\nGreene-50240bookJune19,200210:10394CHAPTER15✦Simultaneous-EquationsModelsvariablesthattheconditionsaremettrivially.Inpractice,itissimpletocheckbothconditionsforasmallmodel.Foralargemodel,frequentlyonlytheorderconditionisverified.Wedistinguishthreecases:1.Underidentified.K∗Mandrankconditionismet.jj15.3.2IDENTIFICATIONTHROUGHOTHERNONSAMPLEINFORMATIONTherankandorderconditionsgivenintheprecedingsectionapplytoidentificationofanequationthroughexclusionrestrictions.Intuitionmightsuggestthatothertypesofnonsampleinformationshouldbeequallyusefulinsecuringidentification.Totakeaspecificexample,supposethatinExample15.5,itisknownthatβ2equals2,not0.Thesecondequationcouldthenbewrittenasq−2z=q∗=β+βp+β∗z+ε.ss01j2Butweknowthatβ∗=0,sothesupplyequationisidentifiedbythisrestriction.Asjthisexamplesuggests,alinearrestrictionontheparameterswithinanequationis,foridentificationpurposes,essentiallythesameasanexclusion.13Byanappropriatemanipulation—thatis,by“solvingout”therestriction—wecanturntherestrictionintoonemoreexclusion.Theorderconditionthatemergesisnj≥M−1,wherenisthetotalnumberofrestrictions.SinceM−1=M+M∗andnisthenumberjjjjofexclusionsplusrj,thenumberofadditionalrestrictions,thisconditionisequivalenttor+K∗+M∗≥M+M∗jjjjjorr+K∗≥M.jjjThisresultisthesameas(15-11)savefortheadditionofthenumberofrestrictions,whichistheresultsuggestedpreviously.15.3.3IDENTIFICATIONTHROUGHCOVARIANCERESTRICTIONS—THEFULLYRECURSIVEMODELTheobservantreaderwillhavenoticedthatnomentionofismadeintheprecedingdiscussion.Tothispoint,alltheinformationprovidedbyisusedintheestimationof;forgiven,therelationshipbetweenandisone-to-one.Recallthat=.Butifrestrictionsareplacedon,thenthereismoreinformationinthanisneededforestimationof.Theexcessinformationcanbeusedinsteadtohelpinfertheelements13Theanalysisismorecomplicatediftherestrictionsareacrossequations,thatis,involvetheparametersinmorethanoneequation.Kelly(1975)containsanumberofresultsandexamples.\nGreene-50240bookJune19,200210:10CHAPTER15✦Simultaneous-EquationsModels395in.Ausefulcaseisthatofzerocovariancesacrossthedisturbances.14Onceagain,itismostconvenienttoconsiderthiscaseintermsofafalsestructure.Ifthestructureis[,B,],thenafalsestructurewouldhaveparameters[˜,B˜,˜]=[F,BF,FF].Ifanyoftheelementsinarezero,thenthefalsestructuremustpreservethoserestrictionstobeadmissible.Forexample,supposethatwespecifythatσ12=0.Thenitmustalsobetruethatσ˜=ff=0,wherefandfarecolumnsofF.Assuch,there121212isarestrictiononFthatmayidentifythemodel.Thefullyrecursivemodelisanimportantspecialcaseoftheprecedingresult.Atriangularsystemisy1=β1x+ε1,y2=γ12y1+β2x+ε2,...yM=γ1My1+γ2My2+···+γM−1,MyM−1+βMx+εM.WeplacenorestrictionsonB.Thefirstequationisidentified,sinceitisalreadyinreducedform.Butforanyoftheothers,linearcombinationsofitandtheonesaboveitinvolvethesamevariables.Thus,weconcludethatwithoutsomeidentifyingrestrictions,onlytheparametersofthefirstequationinatriangularsystemareidentified.Butsupposethatisdiagonal.Thentheentiremodelisidentified,aswenowprove.Asusual,weattempttofindafalsestructurethatsatisfiestherestrictionsofthemodel.ThejthcolumnofF,fj,isthecoefficientsinalinearcombinationoftheequationsthatwillbeanimposterforequationj.Manyfj’sarealreadyprecluded.1.f1mustbethefirstcolumnofanidentitymatrix.Thefirstequationisidentifiedandnormalizedony1.2.InallremainingcolumnsofF,allelementsbelowthediagonalmustbezero,sinceanequationcanonlyinvolvetheysinitorintheequationsaboveit.Withoutfurtherrestrictions,anyuppertriangularFisanadmissibletransformation.Butwithadiagonal,wehavemoreinformation.Considerthesecondcolumn.Since˜mustbediagonal,ff=0.Butgivenfin1above,121ff=σf=0,121112sof12=0.ThesecondcolumnofFisnowcompleteandisequaltothesecondcolumnofI.Continuinginthesamemanner,wefindthatff=0andff=01323willsufficetoestablishthatf3isthethirdcolumnofI.Inthisfashion,itcanbeshownthattheonlyadmissibleFisF=I,whichwastobeshown.Withuppertriangular,M(M−1)/2unknownparametersremained.Thatisexactlythenumberofrestrictionsplacedonwhenitwasassumedtobediagonal.14MoregeneralcasesarediscussedinHausman(1983)andJudgeetal.(1985).\nGreene-50240bookJune19,200210:10396CHAPTER15✦Simultaneous-EquationsModels15.4METHODSOFESTIMATIONItispossibletoestimatethereduced-formparameters,and,consistentlybyordinaryleastsquares.Butexceptforforecastingygivenx,thesearegenerallynottheparametersofinterest;,B,andare.Theordinaryleastsquares(OLS)esti-matorsofthestructuralparametersareinconsistent,ostensiblybecausetheincludedendogenousvariablesineachequationarecorrelatedwiththedisturbances.Still,itisatleastofpassinginteresttoexaminewhatisestimatedbyordinaryleastsquares,particularlyinviewofitswidespreaduse(despiteitsinconsistency).Sincetheproofofidentificationwasbasedonsolvingfor,B,andfromand,onewaytoproceedistoapplyourfindingtothesampleestimates,PandW.Thisindirectleastsquaresapproachisfeasiblebutinefficient.Worse,therewillusuallybemorethanonepossibleestimatorandnoobviousmeansofchoosingamongthem.Therearetwoapproachesfordirectestimation,bothbasedontheprincipleofinstrumentalvariables.Itispossi-bletoestimateeachequationseparatelyusingalimitedinformationestimator.Butthesameprinciplethatsuggeststhatjointestimationbringsefficiencygainsintheseeminglyunrelatedregressionssettingofthepreviouschapterisatworkhere,soweshallalsoconsiderfullinformationorsystemmethodsofestimation.15.5SINGLEEQUATION:LIMITEDINFORMATIONESTIMATIONMETHODSEstimationofthesystemoneequationatatimehasthebenefitofcomputationalsim-plicity.Butbecausethesemethodsneglectinformationcontainedintheotherequations,theyarelabeledlimitedinformationmethods.15.5.1ORDINARYLEASTSQUARESForallTobservations,thenonzerotermsinthejthequationareyj=Yjγj+Xjβj+εj=Zjδj+εj.TheMreduced-formequationsareY=X+V.Fortheincludedendogenousvari-ablesYj,thereducedformsaretheMjappropriatecolumnsofandV,writtenYj=Xj+Vj.(15-13)[Notethatjisthemiddlepartofshownin(15-7).]Likewise,VjisMjcolumnsofV=E−1.Thisleastsquaresestimatoris−1−1YjYjYjXjYjεjdj=[ZjZj]Zjyj=δj+.XjYjXjXjXjεjNoneofthetermsintheinversematrixconvergeto0.Althoughplim(1/T)Xε=0,jjplim(1/T)Yεisnonzero,whichmeansthatbothpartsofdareinconsistent.(Thisjjjisthe“simultaneousequationsbias”ofleastsquares.)Althoughwecansaywithcer-taintythatdjisinconsistent,wecannotstatehowseriousthisproblemis.OLSdoes\nGreene-50240bookJune19,200210:10CHAPTER15✦Simultaneous-EquationsModels397havethevirtueofcomputationalsimplicity,althoughwithmodernsoftware,thisvirtueisextremelymodest.Forbetterorworse,OLSisaverycommonlyusedestimatorinthiscontext.Wewillreturntothisissuelaterinacomparisonofseveralestimators.Anintuitivelyappealingformofsimultaneousequationsmodelisthetriangularsystem,thatweexaminedinSection15.5.3,(1)y=xβ+ε,111(2)y=xβ+γy+ε,221212(3)y=xβ+γy+γy+ε,331312323andsoon.Ifistriangularandisdiagonal,sothatthedisturbancesareuncorrelated,thenthesystemisafullyrecursivemodel.(NorestrictionsareplacedonB.)Itiseasytoseethatinthiscase,theentiresystemmaybeestimatedconsistently(and,asweshallshowlater,efficiently)byordinaryleastsquares.Thefirstequationisaclassicalregressionmodel.Inthesecondequation,Cov(y,ε)=Cov(xβ+ε,ε)=0,soit12112toomaybeestimatedbyordinaryleastsquares.Proceedinginthesamefashionto(3),itisclearthaty1andε3areuncorrelated.Likewise,ifwesubstitute(1)in(2)andthentheresultfory2in(3),thenwefindthaty2isalsouncorrelatedwithε3.Continuinginthisway,wefindthatineveryequationthefullsetofright-handvariablesisuncorrelatedwiththerespectivedisturbance.Theresultisthatthefullyrecursivemodelmaybeconsistentlyestimatedusingequation-by-equationordinaryleastsquares.(Inthemoregeneralcase,inwhichisnotdiagonal,theprecedingargumentdoesnotapply.)15.5.2ESTIMATIONBYINSTRUMENTALVARIABLESInthenextseveralsections,wewilldiscussvariousmethodsofconsistentandefficientestimation.Aswillbeevidentquitesoon,thereisasurprisinglylongmenuofchoices.Itisausefulresultthatallofthemethodsingeneralusecanbeplacedundertheumbrellaofinstrumentalvariable(IV)estimators.Returningtothestructuralform,wefirstconsiderdirectestimationofthejthequation,yj=Yjγj+Xjβj+εj(15-14)=Zjδj+εj.Aswesawpreviously,theOLSestimatorofδjisinconsistentbecauseofthecorrelationofZjandεj.Ageneralmethodofobtainingconsistentestimatesisthemethodofinstrumentalvariables.(SeeSection5.4.)LetWjbeaT×(Mj+Kj)matrixthatsatisfiestherequirementsforanIVestimator,plim(1/T)WZ==afinitenonsingularmatrix,(15-15a)jjwzplim(1/T)Wε=0,(15-15b)jjplim(1/T)WW==apositivedefinitematrix.(15-15c)jjwwThentheIVestimator,δˆ=[WZ]−1Wy,j,IVjjjj\nGreene-50240bookJune19,200210:10398CHAPTER15✦Simultaneous-EquationsModelswillbeconsistentandhaveasymptoticcovariancematrix−1
−1σjj111Asy.Var[δˆj,IV]=plimWjZjWjWjZjWjTTTTσjj−1−1=wzwwzw.(15-16)TAconsistentestimatorofσjjis(y−Zδˆ)(y−Zδˆ)jjj,IVjjj,IVσˆjj=,(15-17)Twhichisthefamiliarsumofsquaresoftheestimateddisturbances.Adegreesoffreedomcorrectionforthedenominator,T−Mj−Kj,issometimessuggested.Asymptotically,thecorrectionisimmaterial.Whetheritisbeneficialinasmallsampleremainstobesettled.Theresultingestimatorisnotunbiasedinanyevent,asitwouldbeintheclassicalregressionmodel.Intheinterestofsimplicity(only),weshallomitthedegreesoffreedomcorrectioninwhatfollows.Currentpracticeinmostapplicationsistomakethecorrection.Thevariousestimatorsthathavebeendevelopedforsimultaneous-equationsmod-elsareallIVestimators.Theydifferinthechoiceofinstrumentsandinwhethertheequationsareestimatedoneatatimeorjointly.Wedividethemintotwoclasses,limitedinformationorfullinformation,onthisbasis.15.5.3TWO-STAGELEASTSQUARESThemethodoftwo-stageleastsquaresisthemostcommonmethodusedforestimatingsimultaneous-equationsmodels.WedevelopedthefullsetofresultsforthisestimatorinSection5.4.Bymerelychangingnotationslightly,theresultsofSection5.4areexactlythederivationoftheestimatorwewilldescribehere.Thus,youmightwanttoreviewthissectionbeforecontinuing.Thetwo-stageleastsquares(2SLS)methodconsistsofusingastheinstrumentsforYjthepredictedvaluesinaregressionofYjonallthexsinthesystem:Yˆ=X[(XX)−1XY]=XP.(15-18)jjjItcanbeshownthatabsentheteroscedasticityorautocorrelation,thisproducesthemostefficientIVestimatorthatcanbeformedusingonlythecolumnsofX.NotetheemulationofE[Yj]=XIIjintheresult.The2SLSestimatoris,thus,−1YˆYYˆXYˆyδˆjjjjjjj,2SLS=.(15-19)XYXXXyjjjjjjBeforeproceeding,itisimportanttoemphasizetheroleoftheidentificationcon-ditioninthisresult.Inthematrix[Yˆj,Xj],whichhasMj+Kjcolumns,allcolumnsarelinearfunctionsoftheKcolumnsofX.Thereexist,atmost,Klinearlyindepen-dentcombinationsofthecolumnsofX.Iftheequationisnotidentified,thenMj+KjisgreaterthanK,and[Yˆj,Xj]willnothavefullcolumnrank.Inthiscase,the2SLSestimatorcannotbecomputed.If,however,theorderconditionbutnottherankcon-ditionismet,thenalthoughthe2SLSestimatorcanbecomputed,itisnotaconsistentestimator.Thereareafewusefulsimplifications.First,sinceX(XX)−1X=(I−M)is\nGreene-50240bookJune19,200210:10CHAPTER15✦Simultaneous-EquationsModels399idempotent,YˆY=YˆYˆ.Second,XX(XX)−1X=XimpliesthatXY=XYˆ.jjjjjjjjjjThus,(15-19)canalsobewritten−1YˆYˆYˆXYˆyδˆjjjjjjj,2SLS=.(15-20)XYˆXXXyjjjjjjThe2SLSestimatorisobtainedbyordinaryleastsquaresregressionofyjonYˆjandXj.Thus,thenamestemsfromthetworegressionsintheprocedure:1.Stage1.ObtaintheleastsquarespredictionsfromregressionofYjonX.2.Stage2.EstimateδjbyleastsquaresregressionofyjonYˆjandXj.Adirectproofoftheconsistencyofthe2SLSestimatorrequiresonlythatweestablishthatitisavalidIVestimator.For(15-15a),werequireYˆY/TYˆX/TPX(XII+V)/TPXX/Tjjjjjjjjjplim=plimXY/TXX/TX(XII+V)/TXX/Tjjjjjjjjjtobeafinitenonsingularmatrix.Wehaveused(15-13)forYj,whichisacontinuousfunctionofPj,whichhasplimPj=j.TheSlutskytheoremthusallowsustosubstitutejforPjintheprobabilitylimit.Thatthepartsconvergetoafinitematrixfollowsfrom(15-3)and(15-5).Itwillbenonsingularifjhasfullcolumnrank,which,inturn,willbetrueiftheequationisidentified.15For(15-15b),werequirethat1Yˆjεj0plim=.TXε0jjThesecondpartisassumedin(15-4).Forthefirst,bydirectsubstitution,YX−11Yˆ−1jXXXεjplimjX(XX)Xεj=plim.TTTTThethirdpartontherightconvergestozero,whereastheothertwoconvergetofinitematrices,whichconfirmstheresult.Sinceδˆj,2SLSisanIVestimator,wecanjustinvokeTheorem5.3fortheasymptoticdistribution.Aproofofasymptoticefficiencyrequirestheestablishmentofthebenchmark,whichweshalldointhediscussionoftheMLE.Asafinalshortcutthatisusefulforprogrammingpurposes,wenotethatifXjisregressedonX,thenaperfectfitisobtained,soXˆj=Xj.Usingtheidempotentmatrix(I−M),(15-20)becomes−1Yj(I−M)YjYj(I−M)XjYj(I−M)yjδˆj,2SLS=.X(I−M)YX(I−M)XX(I−M)yjjjjjjThus,δˆ=[ZˆZˆ]−1Zˆyj,2SLSjjjj(15-21)=[(ZX)(XX)−1(XZ)]−1(ZX)(XX)−1Xy,jjjjwhereallcolumnsofZˆareobtainedaspredictionsinaregressionofthecorrespondingj15Schmidt(1976,pp.150–151)providesaproofofthisresult.\nGreene-50240bookJune19,200210:10400CHAPTER15✦Simultaneous-EquationsModelscolumnofZjonX.Thisequationalsoresultsinausefulsimplificationoftheestimatedasymptoticcovariancematrix,Est.Asy.Var[δˆ]=σˆ[ZˆZˆ]−1.j,2SLSjjjjItisimportanttonotethatσjjisestimatedby(y−Zδˆ)(y−Zδˆ)jjjjjjσˆjj=,Tusingtheoriginaldata,notZˆj.15.5.4GMMESTIMATIONTheGMMestimatorinSection10.4is,withaminorchangeofnotation,preciselythesetofprocedureswehavebeenusinghere.Usingthismethod,however,willallowustogeneralizethecovariancestructureforthedisturbances.Weassumethaty=zδ+ε,jtjtjjtwherezjt=[Yjt,xjt](weusethecapitalYjttodenotetheLjincludedendogenousvari-ables).Thusfar,wehaveassumedthatεjtinthejthequationisneitherheteroscedasticnorautocorrelated.Thereisnoneedtoimposethoseassumptionsatthispoint.Autocor-relationinthecontextofasimultaneousequationsmodelisasubstantialcomplication,however.Forthepresent,wewillconsidertheheteroscedasticcaseonly.Theassumptionsofthemodelprovidetheorthogonalityconditions,E[xε]=E[x(y−zδ)]=0.tjttjtjtjIfxtistakentobethefullsetofexogenousvariablesinthemodel,thenweobtainthecriterionfortheGMMestimator,e(z,δ)XXe(z,δ)tj−1tjq=WjjTT=m¯(δ)W−1m¯(δ),jjjjwhere1Tm¯(δ)=x(y−zδ)andW−1=theGMMweightingmatrix.jtjtjtjjjTt=1Onceagain,thisispreciselytheestimatordefinedinSection10.4[see(10-17)].Ifthedisturbancesareassumedtobehomoscedasticandnonautocorrelated,thentheoptimalweightingmatrixwillbeanestimatoroftheinverseof√Wjj=Asy.Var[Tm¯(δj)]1T=plimxx(y−zδ)2ttjtjtjTt=11T=plimσxxjjttTt=1σ(XX)jj=plimT\nGreene-50240bookJune19,200210:10CHAPTER15✦Simultaneous-EquationsModels401Theconstantσisirrelevanttothesolution.Ifweuse(XX)−1astheweightingmatrix,jjthentheGMMestimatorthatminimizesqisthe2SLSestimator.Theextensionthatwecanobtainhereistoallowforheteroscedasticityofun-knownform.Thereisnoneedtorederivetheearlierresult.Ifthedisturbancesareheteroscedastic,then1TXXjjWjj=plimωjj,txtxt=plim.TTt=1TheweightingmatrixcanbeestimatedwithWhite’sconsistentestimator—see(10-23)—ifaconsistentestimatorofδjisinhandwithwhichtocomputetheresiduals.Oneis,since2SLSignoringtheheteroscedasticityisconsistent,albeitinefficient.Theconclusionthenisthatundertheseassumptions,thereisawaytoimproveon2SLSbyaddinganotherstep.Thename3SLSisreservedforthesystemsestimatorofthissort.Whenchoosingbetween2.5-stageleastsquaresandDavidsonandMacKinnon’ssuggested“heteroscedastic2SLS,orH2SLS,”wechosetooptforthelatter.Theestimatorisbasedontheinitialtwo-stageleastsquaresprocedure.Thus,δˆ=[ZX(S)−1XZ]−1[ZX(S)−1Xy],j,H2SLSj0,jjjj0,jjjwhereTS=xx(y−zδˆ)2.0,jjttjtjtj,2SLSt=1TheasymptoticcovariancematrixisestimatedwithEst.Asy.Var[δˆ]=[ZX(S)−1XZ]−1.j,H2SLSj0,jjjExtensionsofthisestimatorweresuggestedbyCragg(1983)andCumby,Huizinga,andObstfeld(1983).15.5.5LIMITEDINFORMATIONMAXIMUMLIKELIHOODANDTHEKCLASSOFESTIMATORSThelimitedinformationmaximumlikelihood(LIML)estimatorisbasedonasingleequationundertheassumptionofnormallydistributeddisturbances;LIMLisefficientamongsingle-equationestimators.Afull(lengthy)derivationofthelog-likelihoodisprovidedinTheil(1971)andDavidsonandMacKinnon(1993).Wewillproceedtothepracticalaspectsofthisestimatorandreferthereadertothesesourcesforthebackgroundformalities.AresultthatemergesfromthederivationisthattheLIMLestimatorhasthesameasymptoticdistributionasthe2SLSestimator,andthelatterdoesnotrelyonanassumptionofnormality.ThisraisesthequestionwhyonewouldusetheLIMLtechniquegiventheavailabilityofthemorerobust(andcomputationallysimpler)alternative.Smallsampleresultsaresparse,buttheywouldfavor2SLSaswell.[SeePhillips(1983).]TheonesignificantvirtueofLIMLisitsinvariancetothenormalizationoftheequation.Consideranexampleinasystemofequations,y1=y2γ2+y3γ3+x1β1+x2β2+ε1.\nGreene-50240bookJune19,200210:10402CHAPTER15✦Simultaneous-EquationsModelsAnequivalentequationwouldbey2=y1(1/γ2)+y3(−γ3/γ2)+x1(−β1/γ2)+x2(−β2/γ2)+ε1(−1/γ2)=y1γ˜1+y3γ˜3+x1β˜1+x2β˜2+ε˜1Theparametersofthesecondequationcanbemanipulatedtoproducethoseofthefirst.But,asyoucaneasilyverify,the2SLSestimatorisnotinvarianttothenormalizationoftheequation—2SLSwouldproducenumericallydifferentanswers.LIMLwouldgivethesamenumericalsolutionstobothestimationproblemssuggestedabove.TheLIML,orleastvarianceratioestimator,canbecomputedasfollows.16LetW0=E0E0,(15-22)jjjwhereY0=[y,Y]jjjandE0=MY0=[I−X(XX)−1X]Y0.(15-23)jjjjjjjjEachcolumnofE0isasetofleastsquaresresidualsintheregressionofthecorre-jspondingcolumnofY0onX,thatis,theexogenousvariablesthatappearinthejthjjequation.Thus,W0isthematrixofsumsofsquaresandcrossproductsoftheseresiduals.jDefineW1=E1E1=Y0[I−X(XX)−1X]Y0.(15-24)jjjjjThatis,W1isdefinedlikeW0exceptthattheregressionsareonallthexsinthemodel,jjnotjusttheonesinthejthequation.Let1−10λ1=smallestcharacteristicrootofWjWj.(15-25)Thismatrixisasymmetric,butallitsrootsarerealandgreaterthanorequalto1.Dependingontheavailablesoftware,itmaybemoreconvenienttoobtaintheidenticalsmallestrootofthesymmetricmatrixD=(W1)−1/2W0(W1)−1/2.NowpartitionW0intojjjjw0w00jjj1Wj=00correspondingto[yj,Yj],andpartitionWjlikewise.Then,withwjWjjthesepartsinhand,01−101γˆj,LIML=Wjj−λ1Wjjwj−λ1wj(15-26)andβˆ=[XX]−1X(y−Yγˆ).j,LIMLjjjjjj,LIMLNotethatβjisestimatedbyasimpleleastsquaresregression.[See(3-18).]Theasymp-toticcovariancematrixfortheLIMLestimatorisidenticaltothatforthe2SLS16TheleastvarianceratioestimatorisderivedinJohnston(1984).TheLIMLestimatorwasderivedbyAndersonandRubin(1949,1950).\nGreene-50240bookJune19,200210:10CHAPTER15✦Simultaneous-EquationsModels403estimator.17Theimplicationisthatwithnormallydistributeddisturbances,2SLSisfullyefficient.The“kclass”ofestimatorsisdefinedbythefollowingform−1YjYj−kVjVjYjXjYjyj−kVjvjδˆj,k=.XYXXXyjjjjjjWehavealreadyconsideredthreemembersoftheclass,OLSwithk=0,2SLSwithk=1,and,itcanbeshown,LIMLwithk=λ1.[Thislastresultfollowsfrom(15-26).]Therehavebeenmanyotherk-classestimatorsderived;DavidsonandMacKinnon(1993,pp.649–651)andMariano(2001)givediscussion.Ithasbeenshownthatall√membersofthekclassforwhichkconvergesto1ataratefasterthan1/nhavethesameasymptoticdistributionasthatofthe2SLSestimatorthatweexaminedearlier.Thesearelargelyoftheoreticalinterest,giventhepervasiveuseof2SLSorOLS,saveforanimportantconsideration.Thelarge-samplepropertiesofallk-classestimatorestimatorsarethesame,butthefinite-samplepropertiesarepossiblyverydifferent.DavidsonandMacKinnon(1993)andMariano(1982,2001)suggestthatsomeevidencefavorsLIMLwhenthesamplesizeissmallormoderateandthenumberofoveridentifyingrestrictionsisrelativelylarge.15.5.6TWO-STAGELEASTSQUARESINMODELSTHATARENONLINEARINVARIABLESTheanalysisofsimultaneousequationsbecomesconsiderablymorecomplicatedwhentheequationsarenonlinear.Amemiyapresentsageneraltreatmentofnonlinearmod-els.18AcasethatisbroadenoughtoincludemanypracticalapplicationsistheoneanalyzedbyKelejian(1971),y=γf(y,x)+γf(y,x)+···+Xβ+ε,19j1j1j2j2jjjjwhichisanextensionof(7-4).Ordinaryleastsquareswillbeinconsistentforthesamereasonsasbefore,butanIVestimator,ifonecanbedevised,shouldhavethefamiliarproperties.Becauseofthenonlinearity,itmaynotbepossibletosolveforthereduced-formequations(assumingthattheyexist),hij(x)=E[fij|x].Kelejianshowsthat2SLSbasedonaTaylorseriesapproximationtohij,usingthelinearterms,higherpowers,andcross-productsofthevariablesinx,willbeconsistent.Theanalysisof2SLSpresentedearlierthenappliestotheZjconsistingof[fˆ1j,fˆ2j,...,Xj].[Thealternativeapproachofusingfittedvaluesforyappearstobeinconsistent.SeeKelejian(1971)andGoldfeldandQuandt(1968).]Inalinearmodel,ifanequationfailstheordercondition,thenitcannotbeesti-matedby2SLS.ThisstatementisnottrueofKelejian’sapproach,however,sincetakinghigherpowersoftheregressorscreatesmanymorelinearlyindependentinstrumentalvariables.Ifanequationinalinearmodelfailstherankconditionbutnottheorder17Thisisprovedbyshowingthatbothestimatorsaremembersofthe“kclass”ofestimators,allofwhichhavethesameasymptoticcovariancematrix.DetailsaregiveninTheil(1971)andSchmidt(1976).18Amemiya(1985,pp.245–265).See,aswell,Wooldridge(2002,ch.9).192SLSformodelsthatarenonlinearintheparametersisdiscussedinChapters10and11inconnectionwithGMMestimators.\nGreene-50240bookJune19,200210:10404CHAPTER15✦Simultaneous-EquationsModelscondition,thenthe2SLSestimatescanbecomputedinafinitesamplebutwillfailtoexistasymptoticallybecauseXjwillhaveshortrank.Unfortunately,totheextentthatKelejian’sapproximationneverexactlyequalsthetruereducedformunlessithap-penstobethepolynomialinx(unlikely),thisbuilt-incontrolneednotbepresent,evenasymptotically.Thus,althoughthemodelinExample15.7(below)isunidentified,computationofKelejian’s2SLSestimatorappearstoberoutine.Example15.7ANonlinearModelofIndustryStructureThefollowingmodelofindustrystructureandperformancewasestimatedbyStricklandandWeiss(1976).Notethatthesquareoftheendogenousvariable,C,appearsinthefirstequation.2A=α0+α1M+α2Cd+α3C+α4C+α5Gr+α6D+ε1,C=β0+β1A+β2MES+ε2,M=γ0+γ1K+γ2Gr+γ3C+γ4Gd+γ5A+γ6MES+ε3.S=industrysalesM=pricecostmargin,A=advertising/S,D=durablegoodsindustry(0/1),C=concentration,Gr=industrygrowthrate,Cd=consumerdemand/S,K=capitalstock/S,MES=efficientscale/S,Gd=geographicdispersion.Sincetheonlyrestrictionsareexclusions,wemaycheckidentificationbytherulerank[A,A]=M−1discussedinSection15.3.1.Identificationofthefirstequationrequires35β200[A,A]=35γ6γ1γ4tohaveranktwo,whichitdoesunlessβ2=0.Thus,thefirstequationisidentifiedbythepresenceofthescalevariableinthesecondequation.Itiseasilyseenthatthesecondequationisoveridentified.Butforthethird,α4α2α6[A,A]=(!),35000whichhasrankone,nottwo.Thethirdequationisnotidentified.Itpassestheorderconditionbutfailstherankcondition.Thefailureofthethirdequationisobviousoninspection.Thereisnovariableinthesecondequationthatisnotinthethird.Nonetheless,itwaspossibletoobtaintwostageleastsquaresestimatesbecauseofthenonlinearityofthemodelandtheresultsdiscussedabove.15.6SYSTEMMETHODSOFESTIMATIONWemayformulatethefullsystemofequationsasy1Z10···0δ1ε1y20Z2···0δ2ε2.=.....+.(15-27)..............yM00···ZMδMεMory=Zδ+ε,\nGreene-50240bookJune19,200210:10CHAPTER15✦Simultaneous-EquationsModels405whereE[ε|X]=0,andE[εε|X]=¯=⊗I(15-28)[see(14-3).]Theleastsquaresestimator,d=[ZZ]−1Zy,isequation-by-equationordinaryleastsquaresandisinconsistent.Butevenifordinaryleastsquareswereconsistent,weknowfromourresultsfortheseeminglyunrelatedregressionsmodelinthepreviouschapterthatitwouldbeinefficientcomparedwithanestimatorthatmakesuseofthecross-equationcorrelationsofthedisturbances.Forthefirstissue,weturnonceagaintoanIVestimator.Forthesecond,aswedidinChapter14,weuseageneralizedleastsquaresapproach.Thus,assumingthatthematrixofinstrumentalvariables,W¯satisfiestherequirementsforanIVestimator,aconsistentthoughinefficientestimatorwouldbeδˆ=[W¯Z]−1W¯y.(15-29)IVAnalogoustotheseeminglyunrelatedregressionsmodel,amoreefficientestimatorwouldbebasedonthegeneralizedleastsquaresprinciple,δˆ=[W¯(−1⊗I)Z]−1W¯(−1⊗I)y(15-30)IV,GLSor,whereWjisthesetofinstrumentalvariablesforthejthequation,−1σ11WZσ12WZ···σ1MWZMσ1jWy11121Mj=11jσ21WZσ22WZ···σ2MWZMσ2jWyδˆ21222Mj=12jIV,GLS=.......σM1WZσM2WZ···σMMWZMMjM1M2MMj=1σWMyjThreetechniquesaregenerallyusedforjointestimationoftheentiresystemofequations:three-stageleastsquares,GMM,andfullinformationmaximumlikelihood.15.6.1THREE-STAGELEASTSQUARESConsidertheIVestimatorformedfromZˆ10···00Zˆ2···0W¯=Zˆ=diag[X(XX)−1XZ,...,X(XX)−1XZ]=.1M............00···ZˆMTheIVestimatorδˆ=[ZˆZ]−1ZˆyIVissimplyequation-by-equation2SLS.Wehavealreadyestablishedtheconsistencyof2SLS.ByanalogytotheseeminglyunrelatedregressionsmodelofChapter14,however,wewouldexpectthisestimatortobelessefficientthanaGLSestimator.Anatural\nGreene-50240bookJune19,200210:10406CHAPTER15✦Simultaneous-EquationsModelscandidatewouldbeδˆ=[Zˆ(−1⊗I)Z]−1Zˆ(−1⊗I)y.3SLSForthisestimatortobeavalidIVestimator,wemustestablishthat1plimZˆ(−1⊗I)ε=0,TwhichisMsetsofequations,eachoneoftheform1MplimσijZˆε=0.jjTj=1Eachisthesumofvectorsallofwhichconvergetozero,aswesawinthedevelopmentofthe2SLSestimator.Thesecondrequirement,that1plimZˆ(−1⊗I)Z=0,Tandthatthematrixbenonsingular,canbeestablishedalongthelinesofitscounterpartfor2SLS.Identificationofeveryequationbytherankconditionissufficient.[But,seeMariano(2001)onthesubjectof“weakinstruments.”]OnceagainusingtheidempotencyofI−M,wemayalsointerpretthisestimatorasaGLSestimatoroftheformδˆ=[Zˆ(−1⊗I)Zˆ]−1Zˆ(−1⊗I)y.(15-31)3SLSTheappropriateasymptoticcovariancematrixfortheestimatorisAsy.Var[δˆ]=[Z¯(−1⊗I)Z¯]−1,(15-32)3SLSwhereZ¯=diag[Xj,Xj].Thismatrixwouldbeestimatedwiththebracketedinversematrixin(15-31).Usingsampledata,wefindthatZ¯maybeestimatedwithZˆ.Theremainingdifficultyistoobtainanestimateof.Inestimationofthemultivariateregressionmodel,forefficientestimation(thatremainstobeshown),anyconsistentestimatorofwilldo.Thedesignersofthe3SLSmethod,ZellnerandTheil(1962),suggestthenaturalchoicearisingoutofthetwo-stageleastestimates.Thethree-stageleastsquares(3SLS)estimatoristhusdefinedasfollows:1.EstimatebyordinaryleastsquaresandcomputeYˆjforeachequation.2.Computeδˆj,2SLSforeachequation;then(y−Zδˆ)(y−Zδˆ)iiijjjσˆij=.(15-33)T3.ComputetheGLSestimatoraccordingto(15-31)andanestimateoftheasymptoticcovariancematrixaccordingto(15-32)usingZˆandˆ.Itisalsopossibletoiteratethe3SLScomputation.Unliketheseeminglyunrelatedregressionsestimator,however,thismethoddoesnotprovidethemaximumlikelihoodestimator,nordoesitimprovetheasymptoticefficiency.2020AJacobiantermneededtomaximizethelog-likelihoodisnottreatedbythe3SLSestimator.SeeDhrymes(1973).\nGreene-50240bookJune19,200210:10CHAPTER15✦Simultaneous-EquationsModels407Byshowingthatthe3SLSestimatorsatisfiestherequirementsforanIVestimator,wehaveestablisheditsconsistency.Thequestionofasymptoticefficiencyremains.ItcanbeshownthatamongallIVestimatorsthatuseonlythesampleinformationembodiedinthesystem,3SLSisasymptoticallyefficient.21Fornormallydistributeddisturbances,itcanalsobeshownthat3SLShasthesameasymptoticdistributionasthefull-informationmaximumlikelihoodestimator,whichisasymptoticallyefficientamongallestimators.Adirectproofbasedontheinformationmatrixispossible,butweshalltakeamuchsimplerroutebysimplyexploitingahandyresultduetoHausmaninthenextsection.15.6.2FULL-INFORMATIONMAXIMUMLIKELIHOODBecauseoftheirsimplicityandasymptoticefficiency,2SLSand3SLSareusedalmostexclusively(whenordinaryleastsquaresisnotused)fortheestimationofsimultaneous-equationsmodels.Nonetheless,itisoccasionallyusefultoobtainmaximumlikelihoodestimatesdirectly.Thefull-informationmaximumlikelihood(FIML)estimatorisbasedontheentiresystemofequations.Withnormallydistributeddisturbances,FIMLisefficientamongallestimators.TheFIMLestimatortreatsallequationsandallparametersjointly.Toformulatetheappropriatelog-likelihoodfunction,webeginwiththereducedform,Y=X+V,whereeachrowofVisassumedtobemultivariatenormallydistributed,withE[v|X]=0andcovariancematrix,E[vv|X]=.Thelog-likelihoodforthismodeltttispreciselythatoftheseeminglyunrelatedregressionsmodelofChapter14.Forthemoment,wecanignoretherelationshipbetweenthestructuralandreduced-formpa-rameters.Thus,from(14-20),TlnL=−[Mln(2π)+ln||+tr(−1W)],2where100Wij=y−Xπiy−XπjTandπ0=jthcolumnof.jThisfunctionistobemaximizedsubjecttoalltherestrictionsimposedbythestructure.Makethesubstitutions=−B−1and=(−1)−1sothat−1=−1.Thus,T1lnL=−Mln(2π)+ln|(−1)−1|+tr[−1(Y+XB−1)(Y+XB−1)],2Twhichcanbesimplified.First,TT−ln|(−1)−1|=−ln||+Tln||.2221SeeSchmidt(1976)foraproofofitsefficiencyrelativeto2SLS.\nGreene-50240bookJune19,200210:10408CHAPTER15✦Simultaneous-EquationsModelsSecond,(Y+XB−1)=Y+BX.Bypermutingfromthebeginningtotheendofthetraceandcollectingterms,−1(Y+XB)(Y+XB)tr(−1W)=tr.TTherefore,thelog-likelihoodisTlnL=−[Mln(2π)−2ln||+tr(−1S)+ln||],2where1s=(Y+XB)(Y+XB).ijiijjT[Intermsofnonzeroparameters,sijisσˆijof(15-32).]InmaximizinglnL,itisnecessarytoimposealltheadditionalrestrictionsonthestructure.ThetracemaybewrittenintheformMMij−1i=1j=1σ(yi−Yiγi−Xiβi)(yj−Yjγj−Xjβj)tr(S)=.(15-34)TMaximizinglnLsubjecttotheexclusionsin(15-34)andanyotherrestrictions,ifneces-sary,producestheFIMLestimator.Thishasallthedesirableasymptoticpropertiesofmaximumlikelihoodestimatorsand,therefore,isasymptoticallyefficientamongesti-matorsofthesimultaneous-equationsmodel.TheasymptoticcovariancematrixfortheFIMLestimatoristhesameasthatforthe3SLSestimator.AusefulinterpretationoftheFIMLestimatorisprovidedbyDhrymes(1973,p.360)andHausman(1975,1983).TheyshowthattheFIMLestimatorofδisafixedpointintheequationδˆFIML=[Zˆ(δˆ)(ˆ−1⊗I)Z]−1[Zˆ(δˆ)(ˆ−1⊗I)y]=[ZˆˆZ]−1Zˆˆy,whereσˆ11Zˆσˆ12Zˆ···σˆ1MZˆ111σˆ12Zˆσˆ22Zˆ···σˆ2MZˆZˆ(δˆ)(ˆ−1⊗I)=222=Zˆˆ........···.σˆ1MZˆσˆ2MZˆ···σˆMMZˆMMMandZˆj=[Xˆj,Xj].ˆiscomputedfromthestructuralestimates:ˆ=Mcolumnsof−Bˆˆ−1jjand1σˆ=(y−Zδˆ)(y−Zδˆ)andσˆij=(ˆ−1).ijiiijjjijT\nGreene-50240bookJune19,200210:10CHAPTER15✦Simultaneous-EquationsModels409ThisresultimpliesthattheFIMLestimatorisalsoanIVestimator.TheasymptoticcovariancematrixfortheFIMLestimatorfollowsdirectlyfromitsformasanIVestimator.Sincethismatrixisthesameasthatofthe3SLSestimator,weconcludethatwithnormallydistributeddisturbances,3SLShasthesameasymptoticdistributionasmaximumlikelihood.Thepracticalusefulnessofthisimportantresulthasnotgoneunnoticedbypractitioners.The3SLSestimatorisfareasiertocomputethantheFIMLestimator.Thebenefitincomputationalcostcomesatnocostinasymptoticefficiency.Asalways,thesmall-samplepropertiesremainambiguous,butbyandlarge,whereasystemsestimatorisused,3SLSdominatesFIMLnonetheless.22(Onereservationarisesfromthefactthatthe3SLSestimatorisrobusttononnormalitywhereas,becauseofthetermln||inthelog-likelihood,theFIMLestimatorisnot.Infact,the3SLSandFIMLestimatorsareusuallyquitedifferentnumerically.)15.6.3GMMESTIMATIONTheGMMestimatorforasystemofequationsisdescribedinSection14.4.3.Asinthesingle-equationcase,aminorchangeinnotationproducestheestimatorsofthischapter.Asbefore,wewillconsiderthecaseofunknownheteroscedasticityonly.Theextensiontoautocorrelationisquitecomplicated.[SeeCumby,Huizinga,andObstfeld(1983).]Theorthogonalityconditionsdefinedin(14-46)areE[xε]=E[x(y−zδ)]=0.tjttjtjtjIfweconsideralltheequationsjointly,thenweobtainthecriterionforestimationofallthemodel’sparameters,MMe(zt,δj)XjlXe(zt,δl)q=[W]TTj=1l=1MM=m¯(δ)[W]jlm¯(δ),jlj=1l=1where1Tm¯(δ)=x(y−zδ)jtjtjtjTt=1and[W]jl=blockjloftheweightingmatrix,W−1.Asbefore,weconsidertheoptimalweightingmatrixobtainedastheasymptoticcovari-ancematrixoftheempiricalmoments,m¯(δj).Thesemomentsarestackedinasingle√vectorm¯(δ).Then,thejlthblockofAsy.Var[Tm¯(δ)]is1T1T=plim[xx(y−zδ)(y−zδ)]=plimωxx.jlttjtjtjltltljl,tttTTt=1t=122PC-GIVE(8),SAS,andTSP(4.2)arethreecomputerprogramsthatarewidelyused.AsurveyisgiveninSilk(1996).\nGreene-50240bookJune19,200210:10410CHAPTER15✦Simultaneous-EquationsModelsIfthedisturbancesarehomoscedastic,then=σ[plim(XX/T)]isproduced.jljlOtherwise,weobtainamatrixoftheform=plim[XX/T].Collectingterms,jljlthen,thecriterionfunctionforGMMestimationis−1[X(y1−Z1δ1)]/T1112···1M[X(y1−Z1δ1)]/T[X(y2−Z2δ2)]/T2122···2M[X(y2−Z2δ2)]/Tq=..............···..[X(y−Zδ)]/T···[X(y−Zδ)]/TMMMM1M2MMMMMForimplementation,jlcanbeestimatedwith1Tˆ=xx(y−zd)(y−zd),jlttjtjtjltltlTt=1wheredjisaconsistentestimatorofδj.Thetwo-stageleastsquaresestimatorisanaturalchoice.Forthediagonalblocks,thischoiceistheWhiteestimatorasusual.Fortheoff-diagonalblocks,itisasimpleextension.Withthisresultinhand,thefirst-orderconditionsforGMMestimationareMZX∂qˆjˆjlX(yl−Zlδl)=2∂δjTTl=1jlwhereˆisthejlthblockintheinverseoftheestimateifthecentermatrixinq.ThesolutionisMZXˆ1jy1jZXˆ11XZZXˆ12XZ···ZXˆ1MXZ−1j=1δˆ1,GMM11121MM21222M2jδˆ2,GMMZ2XˆXZ1Z2XˆXZ2···Z2XˆXZMZ2Xˆyj.=......j=1.....···...δˆM,GMMZXˆM1XZZXˆM2XZ···ZXˆMMXZ.M1M2MMMZXˆMjyMjj=1TheasymptoticcovariancematrixfortheestimatorwouldbeestimatedwithTtimesthelargeinversematrixinbrackets.Severaloftheestimatorswehavealreadyconsideredarespecialcases:•Ifˆjj=σˆjj(XX/T)andˆjl=0forj=l,thenδˆjis2SLS.•Ifˆjl=0forj=l,thenδˆjisH2SLS,thesingle-equationGMMestimator.•Ifˆjl=σˆjl(XX/T),thenδˆjis3SLS.Asbefore,theGMMestimatorbringsefficiencygainsinthepresenceofheteroscedas-ticity.Ifthedisturbancesarehomoscedastic,thenitisasymptoticallythesameas3SLS,[althoughinafinitesample,itwilldiffernumericallybecauseSjlwillnotbeidenticaltoσˆ(XX)].jl\nGreene-50240bookJune19,200210:10CHAPTER15✦Simultaneous-EquationsModels41115.6.4RECURSIVESYSTEMSANDEXACTLYIDENTIFIEDEQUATIONSFinally,therearetwospecialcasesworthnoting.First,forthefullyrecursivemodel,1.isuppertriangular,withonesonthediagonal.Therefore,||=1andln||=0.M2.isdiagonal,soln||=j=1lnσjjandthetraceintheexponentbecomesM11tr(−1S)=(y−Yγ−Xβ)(y−Yγ−Xβ).jjjjjjjjjjσjjTj=1MThelog-likelihoodreducestolnL=j=1lnLj,whereT1lnL=−[ln(2π)+lnσ]−(y−Yγ−Xβ)(y−Yγ−Xβ).jjjjjjjjjjjjj22σjjTherefore,theFIMLestimatorforthismodelisjustequation-by-equationleastsquares.Wefoundearlierthatordinaryleastsquareswasconsistentinthissetting.Wenowfindthatitisasymptoticallyefficientaswell.Thesecondinterestingspecialcaseoccurswheneveryequationisexactlyidentified.Inthiscase,K∗=Mineveryequation.Itisstraightforwardtoshowthatinthiscase,jj2SLS=3SLS=LIML=FIML,andδˆ=[XZ]−1Xy.jjj15.7COMPARISONOFMETHODS—KLEIN’SMODELITheprecedinghasdescribedalargenumberofestimatorsforsimultaneous-equationsmodels.Asanexample,Table15.3presentslimited-andfull-informationestimatesforKlein’sModelIbasedontheoriginaldatafor1921and1941.TheH3SLSestimatesforthesystemwerecomputedintwopairs,(C,I)and(C,Wp),becausetherewereinsufficientobservationstofitthesystemasawhole.ThefirstofthesearereportedfortheCequation.23Itmightseem,inlightoftheentirediscussion,thatoneofthestructuralestimatorsdescribedpreviouslyshouldalwaysbepreferredtoordinaryleastsquares,which,aloneamongtheestimatorsconsideredhere,isinconsistent.Unfortunately,theissueisnotsoclear.First,itisoftenfoundthattheOLSestimatorissurprisinglyclosetothestructuralestimator.Itcanbeshownthatatleastinsomecases,OLShasasmallervarianceaboutitsmeanthandoes2SLSaboutitsmean,leadingtothepossibilitythatOLSmightbemorepreciseinamean-squared-errorsense.24ButthisresultmustbetemperedbythefindingthattheOLSstandarderrorsare,inalllikelihood,notusefulforinferencepurposes.25Nonetheless,OLSisafrequentlyusedestimator.Obviously,thisdiscussion23TheasymptoticcovariancematrixfortheLIMLestimatorwilldifferfromthatforthe2SLSestimatorinafinitesamplebecausetheestimatorofσjjthatmultipliestheinversematrixwilldifferandbecauseincomputingthematrixtobeinverted,thevalueof“k”(seetheequationafter(15-26))isonefor2SLSandthesmallestrootin(15-25)forLIML.Asymptotically,kequalsoneandtheestimatorsofσjjareequivalent.24SeeGoldberger(1964,pp.359–360).25Cragg(1967).\nGreene-50240bookJune19,200210:10412CHAPTER15✦Simultaneous-EquationsModelsTABLE15.3EstimatesofKlein’sModelI(EstimatedAsymptoticStandardErrorsinParentheses)Limited-InformationEstimatesFull-InformationEstimates2SLS3SLSC16.60.0170.2160.81016.40.1250.1630.790(1.32)(0.118)(0.107)(0.040)(1.30)(0.108)(0.100)(0.033)I20.30.1500.616−0.15828.2−0.0130.756−0.195(7.54)(0.173)(0.162)(0.036)(6.79)(0.162)(0.153)(0.038)Wp1.500.4390.1470.1301.800.4000.1810.150(1.15)(0.036)(0.039)(0.029)(1.12)(0.032)(0.034)(0.028)LIMLFIMLC17.1−0.2220.3960.82318.3−0.2320.3880.802(1.84)(0.202)(0.174)(0.055)(2.49)(0.312)(0.217)(0.036)I22.60.0750.680−0.16827.3−0.8011.052−0.146(9.24)(0.219)(0.203)(0.044)(7.94)(0.491)(0.353)(0.30)Wp1.530.4340.1510.1325.790.2340.2850.235(2.40)(0.137)(0.135)(0.065)(1.80)(0.049)(0.045)(0.035)GMM(H2SLS)GMM(H3SLS)C14.30.0900.1430.86415.70.0680.1670.829(0.897)(0.062)(0.065)(0.029)(0.951)(0.091)(0.080)(0.033)I23.50.1460.591−0.17120.60.213−0.520−0.157(6.40)(0.120)(0.129)(0.031)(4.89)(0.087)(0.099)(0.025)Wp3.060.4550.1060.1302.090.4460.1310.112(0.64)(0.028)(0.030)(0.022)(0.510)(0.019)(0.021)(0.021)OLSI3SLSC16.20.1930.0900.79616.60.1650.1770.766(1.30)(0.091)(0.091)(0.040)(1.22)(0.096)(0.090)(0.035)I10.10.4800.333−0.11242.9−0.3561.01−0.260(5.47)(0.097)(0.101)(0.027)(10.6)(0.260)(0.249)(0.051)Wp1.500.4390.1460.1302.620.3750.1940.168(1.27)(0.032)(0.037)(0.032)(1.20)(0.031)(0.032)(0.029)isrelevantonlytofinitesamples.Asymptotically,2SLSmustdominateOLS,andinacorrectlyspecifiedmodel,anyfull-informationestimatormustdominateanylimited-informationone.Thefinite-samplepropertiesareofcrucialimportance.Mostofwhatweknowisasymptoticproperties,butmostapplicationsarebasedonrathersmallormoderatelysizedsamples.ThelargedifferencebetweentheinconsistentOLSandtheotherestimatessuggeststhebiasdiscussedearlier.Ontheotherhand,theincorrectsignontheLIMLandFIMLestimateofthecoefficientonPandtheevenlargerdifferenceofthecoefficientonP−1intheCequationarestriking.Assumingthattheequationisproperlyspecified,theseanomalieswouldlikewisebeattributedtofinitesamplevariation,becauseLIMLand2SLSareasymptoticallyequivalent.TheGMMestimatorisalsostriking.Theestimatedstandarderrorsarenoticeablysmallerforallthecoefficients.Itshouldbenoted,how-ever,thatthisestimatorisbasedonapresumptionofheteroscedasticitywheninthistimeseries,thereislittleevidenceofitspresence.Theresultsarebroadlysuggestive,\nGreene-50240bookJune19,200210:10CHAPTER15✦Simultaneous-EquationsModels413buttheappearanceofhavingachievedsomethingfornothingisdeceiving.Ourearlierresultsontheefficiencyof2SLSareintact.Ifthereisheteroscedasticity,then2SLSisnolongerfullyefficient,but,thenagain,neitherisH2SLS.Thelatterismoreefficientthantheformerinthepresenceofheteroscedasticity,butitisequivalentto2SLSinitsabsence.Intuitionwouldsuggestthatsystemsmethods,3SLS,GMM,andFIML,aretobepreferredtosingle-equationmethods,2SLSandLIML.Indeed,sincetheadvan-tageissotransparent,whywouldoneeverchooseasingle-equationestimator?Theproperanalogyistotheuseofsingle-equationOLSversusGLSintheSUREmodelofChapter14.Anobviouspracticalconsiderationisthecomputationalsimplicityofthesingle-equationmethods.Butthecurrentstateofavailablesoftwarehasallbutelimi-natedthisadvantage.Althoughthesystemsmethodsareasymptoticallybetter,theyhavetwoproblems.First,anyspecificationerrorinthestructureofthemodelwillbepropagatedthroughoutthesystemby3SLSorFIML.Thelimited-informationestimatorswill,byandlarge,confineaproblemtotheparticularequationinwhichitappears.Second,inthesamefashionastheSUREmodel,thefinite-samplevariationoftheestimatedcovariancematrixistransmittedthroughoutthesystem.Thus,thefinite-samplevarianceof3SLSmaywellbeaslargeasorlargerthanthatof2SLS.Althoughtheyareonlysingleestimates,theresultsforKlein’sModelIgiveastrikingexample.Theupshotwouldappeartobethattheadvantageofthesystemsestimatorsinfinitesamplesmaybemoremodestthantheasymptoticresultswouldsuggest.MonteCarlostudiesoftheissuehavetendedtoreachthesameconclusion.2615.8SPECIFICATIONTESTSInastridentcriticismofstructuralestimation,Liu(1960)arguedthatallsimultaneous-equationsmodelsoftheeconomyweretrulyunidentifiedandthatonlyreducedformscouldbeestimated.Althoughhiscriticismsmayhavebeenexaggerated(andnevergainedwideacceptance),modelershavebeeninterestedintestingtherestrictionsthatoveridentifyaneconometricmodel.Thefirstprocedurefortestingtheoveridentifyingrestrictionsinamodelwasdevel-opedbyAndersonandRubin(1950).Theirlikelihoodratioteststatisticisaby-productofLIMLestimation:LR=χ2[K∗−M]=T(λ−1),jjjwhereλjistherootusedtofindtheLIMLestimator.[See(15-27).]Thestatistichasalimitingchi-squareddistributionwithdegreesoffreedomequaltothenumberofoveridentifyingrestrictions.Alargevalueistakenasevidencethatthereareexogenousvariablesinthemodelthathavebeeninappropriatelyomittedfromtheequationbe-ingexamined.Iftheequationisexactlyidentified,thenK∗−M=0,butatthesamejjtime,therootwillbe1.AnalternativebasedontheLagrangemultiplierprinciplewas26SeeCragg(1967)andthemanyrelatedstudieslistedbyJudgeetal.(1985,pp.646–653).\nGreene-50240bookJune19,200210:10414CHAPTER15✦Simultaneous-EquationsModelsproposedbyHausman(1983,p.433).Operationally,thetestrequiresonlythecalcula-tionofTR2,wheretheR2istheuncenteredR2intheregressionofεˆ=y−Zδˆonalljjjjthepredeterminedvariablesinthemodel.Theestimatedparametersmaybecomputedusing2SLS,LIML,oranyotherefficientlimited-informationestimator.Thestatistichasalimitingchi-squareddistributionwithK∗−Mdegreesoffreedomundertheassumedjjspecificationofthemodel.Anotherspecificationerroroccursifthevariablesassumedtobeexogenousinthesystemare,infact,correlatedwiththestructuraldisturbances.Sincealltheasymptoticpropertiesclaimedearlierrestonthisassumption,thisspecificationerrorwouldbequiteserious.Severalauthorshavestudiedthisissue.27ThespecificationtestdevisedbyHausmanthatweusedinSection5.5intheerrorsinvariablesmodelprovidesamethodoftestingforexogeneityinasimultaneous-equationsmodel.Supposethatthevariablexeisinquestion.Thetestisbasedontheexistenceoftwoestimators,sayδˆandδˆ∗,suchthatunderH:(xeisexogenous),bothδˆandδˆ∗areconsistentandδˆ∗isasymptotically0efficient,underH:(xeisendogenous),δˆisconsistent,butδˆ∗isinconsistent.1Hausmanbaseshisversionofthetestonδˆbeingthe2SLSestimatorandδˆ∗beingthe3SLSestimator.Ashortcomingoftheprocedureisthatitrequiresanarbitrarychoiceofsomeequationthatdoesnotcontainxeforthetest.Forinstance,considertheexogeneityofX−1inthethirdequationofKlein’sModelI.Toapplythistest,wemustuseoneoftheothertwoequations.Asingle-equationversionofthetesthasbeendevisedbySpencerandBerk(1981).Wesupposethatxeappearsinequationj,sothaty=Yγ+Xβ+xeθ+εjjjjjj=[Y,X,xe]δ+ε.jjjjThenδˆ∗isthe2SLSestimator,treatingxeasanexogenousvariableinthesystem,whereasδˆistheIVestimatorbasedonregressingyonY,X,xˆe,wheretheleastjjjsquaresfittedvaluesarebasedonalltheremainingexogenousvariables,excludingxe.Theteststatisticisthen∗∗−1∗w=(δˆ−δˆ)Est.Var[δˆ]−Est.Var[δˆ](δˆ−δˆ),(15-35)whichistheWaldstatisticbasedonthedifferenceofthetwoestimators.Thestatistichasonedegreeoffreedom.(Theextensiontoasetofvariablesisdirect.)Example15.8TestingOveridentifyingRestrictionsForKlein’sModelI,theteststatisticsandcriticalvaluesforthechi-squareddistributionfortheoveridentifyingrestrictionsforthethreeequationsaregiveninTable15.4.Thereare20observationsusedtoestimatethemodelandeightpredeterminedvariables.Theoveridentifyingrestrictionsforthewageequationarerejectedbybothsingle-equationtests.Therearetwopossibilities.Theequationmaywellbemisspecified.Or,asLiusuggests,ina27Wu(1973),Durbin(1954),Hausman(1978),NakamuraandNakamura(1981)andDhrymes(1994).\nGreene-50240bookJune19,200210:10CHAPTER15✦Simultaneous-EquationsModels415TABLE15.4TestStatisticsandCriticalValuesChi-SquaredCriticalValuesλLRTR2K∗−Mχ2[2]χ2[3]jjConsumption1.4999.988.772Investment1.0861.721.8135%5.997.82Wages2.46629.312.4931%9.2111.34dynamicmodel,ifthereisautocorrelationofthedisturbances,thenthetreatmentoflaggedendogenousvariablesasiftheywereexogenousisaspecificationerror.TheresultsabovesuggestaspecificationprobleminthethirdequationofKlein’sModelI.Topursuethatfinding,wenowapplytheprecedingtotesttheexogeneityofX−1.Thetwoestimatedparametervectorsareδˆ∗=[1.5003,0.43886,0.14667,0.13040](i.e.,2SLS)andδˆ=[1.2524,0.42277,0.167614,0.13062].UsingtheWaldcriterion,thechi-squaredstatisticis1.3977.Thus,thehypothesis(suchasitis)isnotrejected.15.9PROPERTIESOFDYNAMICMODELSInmodelswithlaggedendogenousvariables,theentireprevioustimepathoftheexoge-nousvariablesanddisturbances,notjusttheircurrentvalues,determinesthecurrentvalueoftheendogenousvariables.Theintrinsicdynamicpropertiesoftheautoregres-sivemodel,suchasstabilityandtheexistenceofanequilibriumvalue,areembodiedintheirautoregressiveparameters.Inthissection,weareinterestedinlong-andshort-runmultipliers,stabilityproperties,andsimulatedtimepathsofthedependentvariables.15.9.1DYNAMICMODELSANDTHEIRMULTIPLIERSThestructuralformofadynamicmodelisy+xB+y=ε.(15-36)ttt−1tIfthemodelcontainsadditionallags,thenwecanaddadditionalequationstothesystemoftheformy=y.Forexample,amodelwithtwoperiodsoflagswouldbewrittent−1t−101I[ytyt−1]+xt[B0]+[yt−1yt−2]=[εt0]0I20whichcanbetreatedasamodelwithonlyasinglelag—thisisintheformof(15-36).Thereducedformisy=x+y+v,ttt−1twhere=−B−1\nGreene-50240bookJune19,200210:10416CHAPTER15✦Simultaneous-EquationsModelsand=−−1.Fromthereducedform,∂yt,m=km.∂xt,kTheshort-runeffectsarethecoefficientsonthecurrentxs,soisthematrixofimpactmultipliers.Bysubstitutingforyt−1in(15-36),weobtainy=x+x+y2+(v+v).ttt−1t−2tt−1(Thismanipulationcaneasilybedonewiththelagoperator—seeSection19.2.2—butitisjustasconvenienttoproceedinthisfashionforthepresent.)Continuingthismethodforthefulltperiods,weobtaint−1t−1y=[xs]+yt+vs.(15-37)tt−s0t−ss=0s=0Thisshowshowtheinitialconditionsy0andthesubsequenttimepathoftheexogenousvariablesanddisturbancescompletelydeterminethecurrentvaluesoftheendogenousvariables.Thecoefficientmatricesinthebracketedsumarethedynamicmultipliers,∂yt,ms=()km.∂xt−s,kThecumulatedmultipliersareobtainedbyaddingthematricesofdynamicmultipliers.Ifweletsgotoinfinityin(15-37),thenweobtainthefinalformofthemodel,28∞∞y=[xs]+[vs].tt−st−ss=0s=0Assumeforthepresentthatlimt=0.(Thissaysthatisnilpotent.)Thent→∞thematrixofcumulatedmultipliersinthefinalformis[I++2+···]=[I−]−1.Thesecoefficientmatricesarethelong-runorequilibriummultipliers.Wecanalsoobtainthecumulatedmultipliersforsperiodsascumulatedmultipliers=[I−]−1[I−s].Supposethatthevaluesofxwerepermanentlyfixedatx¯.Thenthefinalformshowsthatiftherearenodisturbances,theequilibriumvalueofytwouldbe∞∞y¯=[x¯s]=x¯s=x¯[I−]−1.(15-38)s=0s=028Insometreatments,(15-37)islabeledthefinalforminstead.Bothformseliminatethelaggedvaluesofthedependentvariablesfromthecurrentvalue.Thedependenceofthefirstformontheinitialvaluesmaymakeitsimplertointerpretthanthesecondform.\nGreene-50240bookJune19,200210:10CHAPTER15✦Simultaneous-EquationsModels417Therefore,theequilibriummultipliersare∂y¯m−1=[(I−)]km.∂x¯kSomeexamplesareshownbelowforKlein’sModelI.15.9.2STABILITYItremainstobeshownthatthematrixofmultipliersinthefinalformconverges.Fortheanalysistoproceed,itisnecessaryforthematrixttoconvergetoazeromatrix.Althoughisnotasymmetricmatrix,itwillstillhaveaspectraldecompositionoftheform=CC−1,(15-39)whereisadiagonalmatrixcontainingthecharacteristicrootsofandeachcolumnofCisarightcharacteristicvector,cm=λmcm.(15-40)Sinceisnotsymmetric,theelementsof(andC)maybecomplex.Nonetheless,(A-105)continuestohold:2=CC−1CC−1=C2C−1(15-41)andt=CtC−1.Itisapparentthatwhetherornottvanishesast→∞dependsonitscharacteristic√roots.Theconditionis|λm|<1.Forthecaseofacomplexroot,|λm|=|a+bi|=a2+b2.Foragivenmodel,thestabilitymaybeestablishedbyexaminingthelargestordominantroot.Withmanyendogenousvariablesinthemodelbutonlyafewlaggedvariables,isalargebutsparsematrix.Findingthecharacteristicrootsoflarge,asymmetricmatricesisarathercomplexcomputationproblem(althoughthereexistsspecializedsoftwarefordoingso).Thereisawaytomaketheproblemabitmorecompact.Inthecontextofanexample,inKlein’sModelI,is6×6,butwiththreerowsofzeros,ithasonlyrankthreeandthreenonzeroroots.(SeeTable15.5inExample15.9following.)Thefollowingpartitioningisuseful.Letyt1bethesetofendogenousvariablesthatappearinbothcurrentandlaggedform,andletyt2bethosethatappearonlyincurrentform.Thenthemodelmaybewritteny12[yt1yt2]=xt[12]+[yt−1,1t−1,2]00+[vt1vt2].(15-42)Thecharacteristicrootsofaredefinedbythecharacteristicpolynomial,|−λI|=0.Forthepartitionedmodel,thisresultis1−λI2=0.0−λI\nGreene-50240bookJune19,200210:10418CHAPTER15✦Simultaneous-EquationsModelsWemayuse(A-72)toobtain|−λI|=(−λ)M2|−λI|=0,1whereM2isthenumberofvariablesiny2.Consequently,weneedonlyconcernourselveswiththesubmatrixofthatdefinesexplicitautoregressions.Thepartofthereducedformdefinedbyy=x+yisnotdirectlyrelevant.t2t2t−1,1215.9.3ADJUSTMENTTOEQUILIBRIUMTheadjustmentofadynamicmodeltoanequilibriuminvolvesthefollowingconceptualexperiment.Weassumethattheexogenousvariablesxthavebeenfixedatalevelx¯foralongenoughtimethattheendogenousvariableshavefullyadjustedtotheirequilibriumy¯[definedin(15-38)].Insomearbitrarilychosenperiod,labeledperiod0,anexogenousone-timeshockhitsthesystem,sothatinperiodt=0,xt=x0=x¯.Thereafter,xtreturnstoitsformervaluex¯,andxt=x¯forallt>0.Weknowfromtheexpressionforthefinalformthat,ifdisturbed,ytwillultimatelyreturntotheequilibrium.Thatsituationisensuredbythestabilitycondition.Hereweconsiderthetimepathoftheadjustment.Sinceouronlyconcernatthispointiswiththeexogenousshock,wewillignorethedisturbancesintheanalysis.Attime0,y=x+y.Butpriortotime0,thesystemwasinequilibrium,so00−1y=x+y¯.Theinitialdisplacementduetotheshocktox¯is00y−y¯=x−y¯(I−).00Substitutingx¯=y¯(I−)producesy−y¯=(x−x¯).(15-43)00Asmightbeexpected,theinitialdisplacementisdeterminedentirelybytheexogenousshockoccurringinthatperiod.Sincext=x¯afterperiod0,(15-37)impliesthatt−1y=x¯s+ytt0s=0=x¯(I−)−1(I−t)+yt0=y¯−y¯t+yt0=y¯+(y−y¯)t.0Thus,theentiretimepathisafunctionoftheinitialdisplacement.Byinserting(15-43),weseethaty=y¯+(x−x¯)t.(15-44)t0Sincelimt=0,thepathbacktotheequilibriumsubsequenttotheexogenoust→∞shock(x0−x¯)isdefined.Thestabilityconditionimposedonensuresthatifthesystemisdisturbedatsomepointbyaone-timeshock,thenbarringfurthershocksor\nGreene-50240bookJune19,200210:10CHAPTER15✦Simultaneous-EquationsModels419disturbances,itwillreturntoitsequilibrium.Sincey0,x¯,x0,andarefixedforalltime,theshapeofthepathiscompletelydeterminedbythebehavioroft,whichwenowexamine.Intheprecedingsection,in(15-39)to(15-42),weusedthecharacteristicrootsoftoinferthe(lackof)stabilityofthemodel.Thespectraldecompositionoftgivenin(15-41)maybewrittenMt=λtcd,mmmm=1wherecisthemthcolumnofCanddisthemthrowofC−1.29Insertingthisresultinmm(15-44),givesM(y−y¯)=[(x−x¯)]λtcdt0mmmm=1MM=λt[(x−x¯)cd]=λtg.m0mmmmm=1m=1(NotethatthisequationmayinvolvefewerthanMterms,sincesomeoftherootsmaybezero.ForKlein’sModelI,M=6,butthereareonlythreenonzeroroots.)Sincegmdependsonlyontheinitialconditionsandtheparametersofthemodel,thebehaviorofthetimepathof(y−y¯)iscompletelydeterminedbyλt.Ineachperiod,thedeviationtmfromtheequilibriumisasumofMtermsofpowersofλmtimesaconstant.(Eachvariablehasitsownsetofconstants.)Thetermsinthesumbehaveasfollows:λreal>0,λtaddsadampedexponentialterm,mmλreal<0,λtaddsadampedsawtoothterm,mmλcomplex,λtaddsadampedsinusoidalterm.mmIfwewritethecomplexrootλm=a+biinpolarform,thenλ=A[cosB+isinB],whereA=[a2+b2]1/2andB=arccos(a/A)(inradians),thesinusoidalcomponentseachhaveamplitudeAtandperiod2π/B.30Example15.9DynamicModelThe2SLSestimatesofthestructureandreducedformofKlein’sModelIaregiveninTable15.5.(Onlythenonzerorowsofˆandˆareshown.)Forthe2SLSestimatesofKlein’sModelI,therelevantsubmatrixofˆisKPK0.172−0.051−0.008X−1ˆ1=1.5110.8480.743P−1.−0.287−0.1610.818K−129SeeSectionA.6.9.30Goldberger(1964,p.378).\nGreene-50240bookJune19,200210:10420CHAPTER15✦Simultaneous-EquationsModelsTABLE15.52SLSEstimatesofCoefficientMatricesinKlein’sModelIEquationVariableCIWpXPKC100−100I010−10−1ˆ=Wp−0.81001010X00−0.4391−10P−0.017−0.150010K0000011−16.555−20.278−1.5000Wg−0.81000000Bˆ=T000010G000−100A00−0.13000X−100−0.147000ˆ=P−1−0.216−0.61600000K−100.158000−1142.8025.8331.6368.6337.0025.83Wg1.350.1240.6461.470.8250.125ˆ=T−0.128−0.176−0.133−0.303−1.17−0.176G0.6630.1530.7971.821.020.153A0.159−0.0070.1970.152−0.045−0.007X−10.179−0.0080.2220.172−0.051−0.008ˆ=P−10.7670.7430.6631.5110.8480.743K−1−0.105−0.182−0.125−0.287−0.1610.818Thecharacteristicrootsofthismatrixare0.2995andthecomplexpair0.7692±0.3494i=0.8448[cos0.4263±isin0.4263].Themoduliofthecomplexrootsare0.8448,soweconcludethatthemodelisstable.Theperiodfortheoscillationsis2π/0.4263=14.73periods(years).(SeeFigure15.2.)Foraparticularvariableorgroupofvariables,thevariousmultipliersaresubma-tricesofthemultipliermatrices.ThedynamicmultipliersbasedontheestimatesinTable15.5fortheeffectsofthepolicyvariablesTandGonoutput,X,areplottedinFigure15.2forcurrentand20laggedvalues.Aplotoftheperiodmultipliersagainstthelaglengthiscalledtheimpulseresponsefunction.ThepolicyeffectsonoutputareshowninFigure15.2.Thedampedsinewavepatternischaracteristicofadynamicsys-temwithimaginaryroots.Whentherootsarereal,theimpulseresponsefunctionisamonotonicallydecliningfunction,instead.Thismodelhastheinterestingfeaturethatthelong-runmultipliersofbothpolicyvariablesforinvestmentarezero.Thisisintrinsictothemodel.Theestimatedlong-runbalanced-budgetmultiplierforequalincreasesinspendingandtaxesis2.10+(−1.48)=0.62.\nGreene-50240bookJune19,200210:10CHAPTER15✦Simultaneous-EquationsModels4212TaxesSpending101205101520LagFIGURE15.2ImpulseResponseFunction.15.10SUMMARYANDCONCLUSIONSThemodelssurveyedinthischapterinvolvemostoftheissuesthatariseinanalysisoflinearequationsineconometrics.Beforeoneembarksontheprocessofestimation,itisnecessarytoestablishthatthesampledataactuallycontainsufficientinformationtoprovideestimatesoftheparametersinquestion.Thisisthequestionofidentification.Identificationinvolvesboththestatisticalpropertiesofestimatorsandtheroleoftheoryinthespecificationofthemodel.Onceidentificationisestablished,therearenumerousmethodsofestimation.Weconsideredanumberofsingleequationtechniquesincludingleastsquares,instrumentalvariables,GMM,andmaximumlikelihood.Fullyefficientuseofthesampledatawillrequirejointestimationofalltheequationsinthesystem.Onceagain,thereareseveraltechniques—theseareextensionsofthesingleequationmethodsincludingthreestageleastsquares,GMM,andfullinformationmaximumlikelihood.Inbothframeworks,thisisoneofthosebenignsituationsinwhichthecomputationallysimplestestimatorisgenerallythemostefficientone.Inthefinalsectionofthischapter,weexaminedthespecialpropertiesofdynamicmodels.Animportantconsiderationinthisanalysiswasthestabilityoftheequations.Modernmacroeconometricsinvolvesmanymodelsinwhichoneormorerootsofthedynamicsystemequalone,sothatthesemodels,inthesimpleautoregressiveformareunstable.IntermsoftheanalysisinSection15.9.3,insuchamodel,ashocktothesystemispermanent—theeffectsdonotdieout.WewillexamineamodelofmonetarypolicywiththesecharacteristicsinExample19.6.8.\nGreene-50240bookJune19,200210:10422CHAPTER15✦Simultaneous-EquationsModelsKeyTermsandConcepts•Admissible•Grangercausality•Problemofidentification•Behavioralequation•Identification•Rankcondition•Causality•Impactmultiplier•Recursivemodel•Completesystem•Impulseresponsefunction•Reducedform•Completenesscondition•Indirectleastsquares•Reduced-formdisturbance•Consistentestimates•Initialconditions•Restrictions•Cumulativemultiplier•Instrumentalvariable•Simultaneous-equations•Dominantrootestimatorbias•Dynamicmodel•Interdependent•Specificationtest•Dynamicmultiplier•Jointlydependent•Stability•Econometricmodel•kclass•Structuraldisturbance•Endogenous•Leastvarianceratio•Structuralequation•Equilibriumcondition•Limitedinformation•Systemmethodsof•Equilibriummultipliers•LIMLestimation•Exactlyidentifiedmodel•Nonlinearsystem•Three-stageleastsquares•Exclusionrestrictions•Nonsampleinformation•Triangularsystem•Exogenous•Nonstructural•Two-stageleastsquares•FIML•Normalization•Weaklyexogenous•Finalform•Observationallyequivalent•Fullinformation•Ordercondition•Fullyrecursivemodel•Overidentification•GMMestimation•PredeterminedvariableExercises1.Considerthefollowingtwo-equationmodel:y1=γ1y2+β11x1+β21x2+β31x3+ε1,y2=γ2y1+β12x1+β22x2+β32x3+ε2.a.Verifythat,asstated,neitherequationisidentified.b.Establishwhetherornotthefollowingrestrictionsaresufficienttoidentify(orpartiallyidentify)themodel:(1)β21=β32=0,(2)β12=β22=0,(3)γ1=0,(4)γ1=γ2andβ32=0,(5)σ12=0andβ31=0,(6)γ1=0andσ12=0,(7)β21+β22=1,(8)σ12=0,β21=β22=β31=β32=0,(9)σ12=0,β11=β21=β22=β31=β32=0.2.VerifytherankandorderconditionsforidentificationofthesecondandthirdbehavioralequationsinKlein’sModelI.\nGreene-50240bookJune19,200210:10CHAPTER15✦Simultaneous-EquationsModels4233.Checktheidentifiabilityoftheparametersofthefollowingmodel:1γ1200γ211γ23γ24[y1y2y3y4]0γ321γ34γ41γ42010β12β13β14β2110β24+[x1x2x3x4x5]β31β32β330+[ε1ε2ε3ε4].00β43β440β52004.ObtainthereducedformforthemodelinExercise1undereachoftheassumptionsmadeinpartsaandinpartsb1andb9.5.Thefollowingmodelisspecified:y1=γ1y2+β11x1+ε1,y2=γ2y1+β22x2+β32x3+ε2.Allvariablesaremeasuredasdeviationsfromtheirmeans.Thesampleof25observationsproducesthefollowingmatrixofsumsofsquaresandcrossproducts:y1y2x1x2x3y1206435y2610367x143523.x2362108x3573815a.EstimatethetwoequationsbyOLS.b.Estimatetheparametersofthetwoequationsby2SLS.Alsoestimatetheasymp-toticcovariancematrixofthe2SLSestimates.c.ObtaintheLIMLestimatesoftheparametersofthefirstequation.d.Estimatethetwoequationsby3SLS.e.Estimatethereduced-formcoefficientmatrixbyOLSandindirectlybyusingyourstructuralestimatesfromPartb.6.Forthemodely1=γ1y2+β11x1+β21x2+ε1,y2=γ2y1+β32x3+β42x4+ε2,showthattherearetworestrictionsonthereduced-formcoefficients.Describeaprocedureforestimatingthemodelwhileincorporatingtherestrictions.\nGreene-50240bookJune19,200210:10424CHAPTER15✦Simultaneous-EquationsModels7.AnupdatedversionofKlein’sModelIwasestimated.Therelevantsubmatrixofis−0.1899−0.9471−0.89911=00.92870.−0.0656−0.07910.0952Isthemodelstable?8.ProvethatYεjjplim=ω.j−jjγj.T9.Provethatanunderidentifiedequationcannotbeestimatedby2SLS.\nGreene-50240bookJune20,200218:216ESTIMATIONFRAMEWORKSINECONOMETRICSQ16.1INTRODUCTIONThischapterbeginsourtreatmentofmethodsofestimation.Contemporaryeconomet-ricsoffersthepractitioneraremarkablevarietyofestimationmethods,rangingfromtightlyparameterizedlikelihoodbasedtechniquesatoneendtothinlystatednonpara-metricmethodsthatassumelittlemorethanmereassociationbetweenvariablesattheother,andarichvarietyinbetween.Eventheexperiencedresearchercouldbeforgivenforwonderinghowtheyshouldchoosefromthislongmenu.Itiscertainlybeyondourscopetoanswerthisquestionhere,butafewprinciplescanbesuggested.Recentresearchhasleanedwhenpossibletowardmethodsthatrequirefew(orfewer)possiblyunwarrantedorimproperassumptions.ThisexplainstheascendanceoftheGMMestimatorinsituationswherestronglikelihood-basedparameterizationscanbeavoidedandrobustestimationcanbedoneinthepresenceofheteroscedasticityandserialcorrelation.(Itisintriguingtoobservethatthisisoccurringatatimewhenad-vancesincomputationhavehelpedbringaboutincreasedacceptanceofveryheavilyparameterizedBayesianmethods.)Asageneralproposition,theprogressionfromfulltosemi-tonon-parametricestimationrelaxesstrongassumptions,butatthecostofweakeningtheconclusionsthatcanbedrawnfromthedata.Asmuchasanywhereelse,thisisclearintheanal-ysisofdiscretechoicemodels,whichprovideoneofthemostactiveliteraturesinthefield.(AsamplerappearsinChapter21.)Aformalprobitorlogitmodelallowsestima-tionofprobabilities,marginaleffects,andahostofancillaryresults,butatthecostofimposingthenormalorlogisticdistributiononthedata.Semiparametricandnonpara-metricestimatorsallowonetorelaxtherestriction,butoftenprovide,inreturn,onlyrangesofprobabilities,ifthat,andinmanycases,precludeestimationofprobabilitiesorusefulmarginaleffects.Onedoeshavethevirtueofrobustnessintheconclusions,however.[See,e.g.,thesymposiuminAngrist(2001)foraspiriteddiscussiononthesepoints.]Estimationpropertiesisanotherarenainwhichthedifferentapproachescanbecompared.Withinaclassofestimators,onecandefine“thebest”(mostefficient)meansofusingthedata.(SeeExample16.2belowforanapplication.)Sometimescomparisonscanbemadeacrossclassesaswell.Forexample,whentheyareestimatingthesameparameters—thisremainstobeestablished—thebestparametricestimatorwillgener-allyoutperformthebestsemiparametricestimator.Thatisthevalueoftheinformation,ofcourse.Theothersideofthecomparison,however,isthatthesemiparametricesti-matorwillcarrythedayiftheparametricmodelismisspecifiedinafashiontowhichthesemiparametricestimatorisrobust(andtheparametricmodelisnot).425\nGreene-50240bookJune20,200218:2426CHAPTER16✦EstimationFrameworksinEconometricsSchoolsofthoughthaveenteredthisconversationforalongtime.ProponentsofBayesianestimationoftentookanalmosttheologicalviewpointintheircriticismoftheirclassicalcolleagues.[See,forexample,Poirier(1995).]Contemporarypractitionersareusuallymorepragmaticthanthis.Bayesianestimationhasgainedcurrencyasasetoftechniquesthatcan,inverymanycases,providebothelegantandtractablesolutionstoproblemsthathaveheretoforebeenoutofreach.Thus,forexample,thesimulation-basedestimationadvocatedinthemanypapersofChibandGreenberg(e.g.,1996)haveprovidedsolutionstoavarietyofcomputationallychallengingproblems.1Argumentsastothemethodologicalvirtueofoneapproachortheotherhavereceivedmuchlessattentionthanbefore.Chapters2though9ofthisbookhavefocusedontheclassicalregressionmodelandaparticularestimator,leastsquares(linearandnonlinear).Inthisandthenexttwochapters,wewillexamineseveralgeneralestimationstrategiesthatareusedinawidevarietyofsituations.Thischapterwillsurveyafewmethodsinthethreebroadareaswehavelisted,includingBayesianmethods.Chapter17presentsthemethodofmaximumlikelihood,thebroadplatformforparametric,classicalestimationinecono-metrics.Chapter18discussesthegeneralizedmethodofmoments,whichhasemergedasthecenterpieceofsemiparametricestimation.Sections16.2.4and17.8willexaminetwospecificestimationframeworks,oneBayesianandoneclassical,thatarebasedonsimulationmethods.Thisisarecentlydevelopedbodyoftechniquesthathavebeenmadefeasiblebyadvancesinestimationtechnologyandwhichhasmadequitestraight-forwardmanyestimatorswhichwerepreviouslyonlyscarcelyusedbecauseofthesheerdifficultyofthecomputations.Thelistoftechniquespresentedhereisfarfromcomplete.Wehavechosenasetthatconstitutethemainstreamofeconometrics.Certainlythereareothersthatmightbeconsidered.[See,forexample,Mittelhammer,Judge,andMiller(2000)foralengthycatalog.]Virtuallyallofthemarethesubjectofexcellentmonographsonthesubject.Inthischapterwewillpresentseveralapplications,somefromtheliterature,somehomegrown,todemonstratetherangeoftechniquesthatarecurrentineconometricpractice.WebegininSection16.2withparametricapproaches,primarilymaximumlikelihood.Sincethisisthesubjectofmuchoftheremainderofthisbook,thissectionisbrief.Section16.2alsopresentsBayesianestimation,whichinitstraditionalform,isasheav-ilyparameterizedasmaximumlikelihoodestimation.Thissectionfocusesmostlyonthelinearmodel.AfewapplicationsofBayesiantechniquestoothermodelsarepresentedaswell.WewillalsoreturntowhatiscurrentlythestandardtoolkitinBayesianesti-mation,MarkovChainMonteCarlomethodsinSection16.2.4.Section16.2.3presentsanemergingtechniqueintheclassicaltradition,latentclassmodeling,whichmakesinterestinguseofafundamentalresultbasedonBayesTheorem.Section16.3isonsemiparametricestimation.GMMestimationisthesubjectofallofChapter18,soitis1ThepenetrationofBayesianeconometricscouldbeoverstated.ItisfairlywellrepresentedinthecurrentjournalssuchastheJournalofEconometrics,JournalofAppliedEconometrics,JournalofBusinessandEconomicStatistics,andsoon.Ontheotherhand,inthesixmajorgeneraltreatmentsofeconometricspublishedin2000,four(Hayashi,Ruud,Patterson,Davidson)donotmentionBayesianmethodsatall,abuffetof32essays(Baltagi)devotesonlyonetothesubject,andtheonethatdisplaysanypreference(Mittelhammeretal.)devotesnearly10percent(70)ofitspagestoBayesianestimation,butalltothebroadmetatheoryorthelinearregressionmodelandnonetothemoreelaborateapplicationsthatformthereceivedapplicationsinthemanyjournalsinthefield.\nGreene-50240bookJune20,200218:2CHAPTER16✦EstimationFrameworksinEconometrics427onlyintroducedhere.Thetechniqueofleastabsolutedeviationsispresentedhereaswell.Arangeofapplicationsfromtherecentliteratureisalsosurveyed.Section16.4describesnonparametricestimation.Thefundamentaltool,thekerneldensityestima-torisdeveloped,thenappliedtoaprobleminregressionanalysis.Twoapplicationsarepresentedhereaswell.Beingfocusedonapplication,thischapterwillsayverylittleaboutthestatisticaltheoryforofthesetechniques—suchastheirasymptoticproperties.(Theresultsaredevelopedatlengthintheliterature,ofcourse.)Wewillturntothesubjectofthepropertiesofestimatorsbrieflyattheendofthechapter,inSection16.5,theningreaterdetailinChapters17and18.16.2PARAMETRICESTIMATIONANDINFERENCEParametricestimationdepartsfromafullstatementofthedensityorprobabilitymodelthatprovidesthedatageneratingmechanismforarandomvariableofinterest.Forthesortsofapplicationswehaveconsideredthusfar,wemightsaythatthejointdensityofascalarrandomvariable,“y”andarandomvector,“x”ofinterestcanbespecifiedbyf(y,x)=g(y|x,β)×h(x|θ)(16-1)withunknownparametersβandθ.TocontinuetheapplicationthathasoccupiedussinceChapter2,considerthelinearregressionmodelwithnormallydistributeddistur-bances.Theassumptionproducesafullstatementoftheconditionaldensitythatisthepopulationfromwhichanobservationisdrawn;y|x∼N[xβ,σ2].iiiAllthatremainsforafulldefinitionofthepopulationisknowledgeofthespecificvaluestakenbytheunknownbutfixedparameters.Withthoseinhand,theconditionalprobabilitydistributionforyiiscompletelydefined—mean,variance,probabilitiesofcertainevents,andsoon.(Themarginaldensityfortheconditioningvariablesisusuallynotofparticularinterest.)Thus,thesignaturefeaturesofthismodelingplatformarespecificationofboththedensityandthefeatures(parameters)ofthatdensity.Theparameterspacefortheparametricmodelisthesetofallowablevaluesoftheparameterswhichsatisfysomepriorspecificationofthemodel.Forexample,intheregressionmodelspecifiedpreviously,theKregressionslopesmaytakeanyrealvalue,butthevariancemustbeapositivenumber.Therefore,theparameterspaceforthatmodelis[β,σ2]∈RK×R.“Estimation”inthiscontextconsistsofspecifying+acriterionforrankingthepointsintheparameterspace,thenchoosingthatpoint(apointestimate)orasetofpoints(anintervalestimate)thatoptimizesthatcriterion,thatis,hasthebestranking.Thus,forexample,wechoselinearleastsquaresasoneestimationcriterionforthelinearmodel.“Inference”inthissettingisaprocessbywhichsomeregionsofthe(alreadyspecified)parameterspacearedeemednottocontaintheunknownparameters,though,inmorepracticalterms,wetypicallydefineacriterionandthen,statethat,bythatcriterion,certainregionsareunlikelytocontainthetrueparameters.\nGreene-50240bookJune20,200218:2428CHAPTER16✦EstimationFrameworksinEconometrics16.2.1CLASSICALLIKELIHOODBASEDESTIMATIONThemostcommon(byfar)classofparametricestimatorsusedineconometricsisthemaximumlikelihoodestimators.Theunderlyingphilosophyofthisclassofestimatorsistheideaof“sampleinformation.”Whenthedensityofasampleofobservationsiscompletelyspecified,apartfromtheunknownparameters,thenthejointdensityofthoseobservations(assumingtheyareindependent),isthelikelihoodfunction,nf(y1,y2,...,x1,x2,...)=f(yi,xi|β,θ),(16-2)i=1Thisfunctioncontainsalltheinformationavailableinthesampleaboutthepopulationfromwhichthoseobservationsweredrawn.Thestrategybywhichthatinformationisusedinestimationconstitutestheestimator.Themaximumlikelihoodestimator[Fisher(1925)]isthatfunctionofthedatawhich(asitsnameimplies)maximizesthelikelihoodfunction(or,becauseitisusuallymoreconvenient,thelogofthelikelihoodfunction).Themotivationforthisapproachismosteasilyvisualizedinthesettingofadiscreterandomvariable.Inthiscase,thelikelihoodfunctiongivesthejointprobabilityfortheobservedsampleobservations,andthemaximumlikelihoodestimatoristhefunctionofthesampleinformationwhichmakestheobserveddatamostprobable(atleastbythatcriterion).Thoughtheanalogyismostintuitivelyappealingforadiscretevariable,itcarriesovertocontinuousvariablesaswell.SincethisestimatoristhesubjectofChapter17,whichisquitelengthy,wewilldeferanyformaldiscussionuntilthen,andconsiderinsteadtwoapplicationstoillustratethetechniquesandunderpinnings.Example16.1TheLinearRegressionModelLeastsquaresweighsnegativeandpositivedeviationsequallyandgivesdisproportionateweighttolargedeviationsinthecalculation.Thispropertycanbeanadvantageoradisad-vantage,dependingonthedata-generatingprocess.Fornormallydistributeddisturbances,thismethodispreciselytheoneneededtousethedatamostefficiently.Ifthedataaregeneratedbyanormaldistribution,thenthelogofthelikelihoodfunctionisnn12lnL=−ln2π−lnσ−(y−Xβ)(y−Xβ).222σ2Youcaneasilyshowthatleastsquaresistheestimatorofchoiceforthismodel.Maximizingthefunctionmeansminimizingtheexponent,whichisdonebyleastsquaresforβandee/nforσ2.Iftheappropriatedistributionisdeemedtobesomethingotherthannormal—perhapsonthebasisofanobservationthatthetailsofthedisturbancedistributionaretoothick—seeExample5.1andSection17.6.3—thentherearethreewaysonemightproceed.First,aswehaveobserved,theconsistencyofleastsquaresisrobusttothisfailureofthespecifi-cation,solongastheconditionalmeanofthedisturbancesisstillzero.Somecorrectiontothestandarderrorsisnecessaryforproperinferences.(SeeSection10.3.)Second,onemightwanttoproceedtoanestimatorwithbetterfinitesampleproperties.TheleastabsolutedeviationsestimatordiscussedinSection16.3.2isacandidate.Finally,onemightconsidersomeotherdistributionwhichaccommodatestheobserveddiscrepancy.Forexample,Ruud(2000)examinesinsomedetailalinearregressionmodelwithdisturbancesdistributedac-cordingtothetdistributionwithvdegreesoffreedom.Aslongasvisfinite,thisrandomvariablewillhavealargervariancethanthenormal.Whichwayshouldoneproceed?Thethirdapproachistheleastappealing.Surelyifthenormaldistributionisinappropriate,thenitwouldbedifficulttocomeupwithaplausiblemechanismwherebythetdistributionwouldnotbe.TheLADestimatormightwellbepreferableifthesampleweresmall.Ifnot,thenleast\nGreene-50240bookJune20,200218:2CHAPTER16✦EstimationFrameworksinEconometrics429squareswouldprobablyremaintheestimatorofchoice,withsomeallowanceforthefactthatstandardinferencetoolswouldprobablybemisleading.Currentpracticeisgenerallytoadoptthefirststrategy.Example16.2TheStochasticFrontierModelThestochasticfrontiermodel,discussedindetailinSection17.6.3,isaregression-likemodelwithadisturbancethatisasymmetricanddistinctlynonnormal.(SeeFigure17.3.)Theconditionaldensityforthedependentvariableinthismodelis√2−(y−α−xβ)2−λ(y−α−xβ)f(y|x,β,σ,λ)=√expσπ2σ2σThisproducesalog-likelihoodfunctionforthemodel,n2nn21εi−εiλlnL=−nlnσ−ln−+ln2π2σσi=1i=1Thereareatleasttwofullyparametricestimatorsforthismodel.ThemaximumlikelihoodestimatorisdiscussedinSection17.6.3.Greene(1997b)presentsthefollowingmethodofmomentsestimator:Fortheregressionslopes,excludingtheconstantterm,useleastsquares.Fortheparametersα,σ,andλ,basedonthesecondandthirdmomentsoftheleastsquaresresidualsandleastsquaresconstant,solve22m2=σv+[1−2/π]σu1/23m3=(2/π)[1−4/π]σu2a=α+(2/π)σuwhereλ=σ/σandσ2=σ2+σ2.uvuvBothestimatorsarefullyparametric.Themaximumlikelihoodestimatorisforthereasonsdiscussedearlier.Themethodofmomentsestimators(seeSection18.2)areappropriateonlyforthisdistribution.Whichispreferable?AswewillseeinChapter17,bothestimatorsareconsistentandasymptoticallynormallydistributed.ByvirtueoftheCramer´–Raotheorem,themaximumlikelihoodestimatorhasasmallerasymptoticvariance.Neitherhasanysmallsampleoptimalityproperties.Thus,theonlyvirtueofthemethodofmomentsestimatoristhatonecancomputeitwithanystandardregression/statisticscomputerpackageandahandcalculatorwhereasthemaximumlikelihoodestimatorrequiresspecializedsoftware(onlysomewhat—itisreasonablycommon).16.2.2BAYESIANESTIMATIONParametricformulationspresentabitofamethodologicaldilemma.Theywouldseemtostraightjackettheresearcherintoafixedandimmutablespecificationofthemodel.Butinanyanalysis,thereisuncertaintyastothemagnitudesandeven,onoccasion,thesignsofcoefficients.Itisrarethatthepresentationofasetofempiricalresultshasnotbeenprecededbyatleastsomeexploratoryanalysis.ProponentsoftheBayesianmethodologyarguethattheprocessof“estimation”isnotoneofdeducingthevaluesoffixedparameters,butratheroneofcontinuallyupdatingandsharpeningoursubjectivebeliefsaboutthestateoftheworld.ThecenterpieceoftheBayesianmethodologyisBayestheorem:foreventsAandB,theconditionalprobabilityofeventAgiventhatBhasoccurredisP(B|A)P(A)P(A|B)=.P(B)\nGreene-50240bookJune20,200218:2430CHAPTER16✦EstimationFrameworksinEconometricsParaphrasedforourapplicationshere,wewouldwriteP(data|parameters)P(parameters)P(parameters|data)=.P(data)Inthissetting,thedataareviewedasconstantswhosedistributionsdonotinvolvetheparametersofinterest.Forthepurposeofthestudy,wetreatthedataasonlyafixedsetofadditionalinformationtobeusedinupdatingourbeliefsabouttheparameters.[Notethesimilaritytothewaythatthejointdensityforourparametricmodelisspecifiedin(16-1).]Thus,wewriteP(parameters|data)∝P(data|parameters)P(parameters)=Likelihoodfunction×Priordensity.Thesymbol∝means“isproportionalto.”Intheprecedingequation,wehavedroppedthemarginaldensityofthedata,sowhatremainsisnotaproperdensityuntilitisscaledbywhatwillbeaninessentialproportionalityconstant.Thefirsttermontherightisthejointdistributionoftheobservedrandomvariablesy,giventheparameters.Asweshallanalyzeithere,thisdistributionisthenormaldistributionwehaveusedinourpreviousanalysis—see(16-1).Thesecondtermisthepriorbeliefsoftheanalyst.Theleft-handsideistheposteriordensityoftheparameters,giventhecurrentbodyofdata,orourrevisedbeliefsaboutthedistributionoftheparametersafter“seeing”thedata.Theposteriorisamixtureofthepriorinformationandthe“currentinformation,”thatis,thedata.Onceobtained,thisposteriordensityisavailabletobethepriordensityfunctionwhenthenextbodyofdataorotherusableinformationbecomesavailable.Theprincipleinvolved,whichappearsnowhereintheclassicalanalysis,isoneofcontinualaccretionofknowledgeabouttheparameters.TraditionalBayesianestimationisheavilyparameterized.Thepriordensityandthelikelihoodfunctionarecrucialelementsoftheanalysis,andbothmustbefullyspecifiedforestimationtoproceed.TheBayesian“estimator”isthemeanoftheposteriordensityoftheparameters,aquantitythatisusuallyobtainedeitherbyintegration(whenclosedformsexist),approximationofintegralsbynumericaltechniques,orbyMonteCarlomethods,whicharediscussedinSection16.2.4.16.2.2.aBAYESIANANALYSISOFTHECLASSICALREGRESSIONMODELThecomplexityofthealgebrainvolvedinBayesiananalysisisoftenextremelybur-densome.Forthelinearregressionmodel,however,manyfairlystraightforwardresultshavebeenobtained.Toprovidesomeoftheflavorofthetechniques,wepresentthefullderivationonlyforsomesimplecases.Intheinterestofbrevity,andtoavoidtheburdenofexcessivealgebra,wereferthereadertooneoftheseveralsourcesthatpresentthefullderivationofthemorecomplexcases.2TheclassicalnormalregressionmodelwehaveanalyzedthusfarisconstructedaroundtheconditionalmultivariatenormaldistributionN[Xβ,σ2I].Theinterpreta-tionisdifferenthere.Inthesamplingtheorysetting,thisdistributionembodiesthe2ThesesourcesincludeJudgeetal.(1982,1985),Maddala(1977a),Mittelhammeretal.(2000),andthecanonicalreferenceforeconometricians,Zellner(1971).FurthertopicsinBayesianinferencearecontainedinZellner(1985).ArecenttreatmentofbothBayesianandsamplingtheoryapproachesisPoirier(1995).\nGreene-50240bookJune20,200218:2CHAPTER16✦EstimationFrameworksinEconometrics431informationabouttheobservedsampledatagiventheassumeddistributionandthefixed,albeitunknown,parametersofthemodel.IntheBayesiansetting,thisfunctionsummarizestheinformationthataparticularrealizationofthedataprovidesabouttheassumeddistributionofthemodelparameters.Tounderscorethatidea,werenamethisjointdensitythelikelihoodforβandσ2giventhedata,so22−n/2−[(1/(2σ2))(y−Xβ)(y−Xβ)]L(β,σ|y,X)=[2πσ]e.(16-3)Forpurposesoftheresultsbelow,somereformulationisuseful.Letd=n−K(thedegreesoffreedomparameter),andsubstitutey−Xβ=y−Xb−X(β−b)=e−X(β−b)intheexponent.Expandingthisproduces112111−(y−Xβ)(y−Xβ)=−ds−(β−b)XX(β−b).2σ22σ22σ2Afterabitofmanipulation(notethatn/2=d/2+K/2),thelikelihoodmaybewrittenL(β,σ2|y,X)−d/22−d/2−(d/2)(s2/σ2)−K/22−K/2−(1/2)(β−b)[σ2(XX)−1]−1(β−b)=[2π][σ]e[2π][σ]e.Thisdensityembodiesallthatwehavetolearnabouttheparametersfromtheobserveddata.Sincethedataaretakentobeconstantsinthejointdensity,wemaymultiplythisjointdensitybythe(verycarefullychosen),inessential(sinceitdoesnotinvolveβorσ2)constantfunctionoftheobservations,(d/2)+1ds22[2π](d/2)−1/2A=|XX|.d+12Forconvenience,letv=d/2.Then,multiplyingL(β,σ2|y,X)byAgives2v+1v2[vs]1−vs2(1/σ2)−K/22−1−1/2L(β,σ|y,X)∝e[2π]|σ(XX)|(v+1)σ2−(1/2)(β−b)[σ2(XX)−1]−1(β−b)×e.(16-4)Thelikelihoodfunctionisproportionaltotheproductofagammadensityforz=1/σ2withparametersλ=vs2andP=v+1[see(B-39);thisisaninvertedgammadistribution]andaK-variatenormaldensityforβ|σ2withmeanvectorbandcovariancematrixσ2(XX)−1.Thereasonwillbeclearshortly.ThedeparturepointfortheBayesiananalysisofthemodelisthespecificationofapriordistribution.Thisdistributiongivestheanalyst’spriorbeliefsabouttheparametersofthemodel.Oneoftwoapproachesisgenerallytaken.Ifnopriorinformationisknownabouttheparameters,thenwecanspecifyanoninformativepriorthatreflectsthat.Wedothisbyspecifyinga“flat”priorfortheparameterinquestion:3g(parameter)∝constant.3Thatthis“improper”densitymightnotintegratetooneisonlyaminordifficulty.Anyconstantofintegrationwouldultimatelydropoutofthefinalresult.SeeZellner(1971,pp.41–53)foradiscussionofnoninformativepriors.\nGreene-50240bookJune20,200218:2432CHAPTER16✦EstimationFrameworksinEconometricsTherearedifferentwaysthatonemightcharacterizethelackofpriorinformation.Theimplicationofaflatprioristhatwithintherangeofvalidvaluesfortheparameter,allintervalsofequallength—hence,inprinciple,allvalues—areequallylikely.Thesecondpossibility,aninformativeprior,istreatedinthenextsection.Theposteriordensityistheresultofcombiningthelikelihoodfunctionwiththepriordensity.Sinceitpoolsthefullsetofinformationavailabletotheanalyst,oncethedatahavebeendrawn,theposteriordensitywouldbeinterpretedthesamewaythepriordensitywasbeforethedatawereobtained.Tobegin,weanalyzethecaseinwhichσ2isassumedtobeknown.Thisassumptionisobviouslyunrealistic,andwedosoonlytoestablishapointofdeparture.UsingBayesTheorem,weconstructtheposteriordensity,L(β|σ2,y,X)g(β|σ2)f(β|y,X,σ2)=∝L(β|σ2,y,X)g(β|σ2),f(y)assumingthatthedistributionofXdoesnotdependonβorσ2.Sinceg(β|σ2)∝aconstant,thisdensityistheonein(16-4).Fornow,write22−K/22−1−1/2−(1/2)(β−b)[σ2(XX)−1]−1(β−b)f(β|σ,y,X)∝h(σ)[2π]|σ(XX)|e,(16-5)where2v+1v2[vs]1−vs2(1/σ2)h(σ)=e.(16-6)(v+1)σ2Forthepresent,wetreath(σ2)simplyasaconstantthatinvolvesσ2,notasaproba-bilitydensity;(16-5)isconditionalonσ2.Thus,theposteriordensityf(β|σ2,y,X)isproportionaltoamultivariatenormaldistributionwithmeanbandcovariancematrixσ2(XX)−1.Thisresultisfamiliar,butitisinterpreteddifferentlyinthissetting.First,wehavecombinedourpriorinformationaboutβ(inthiscase,noinformation)andthesampleinformationtoobtainaposteriordistribution.Thus,onthebasisofthesampledatainhand,weobtainadistributionforβwithmeanbandcovariancematrixσ2(XX)−1.Theresultisdominatedbythesampleinformation,asitshouldbeifthereisnopriorinfor-mation.Intheabsenceofanypriorinformation,themeanoftheposteriordistribution,whichisatypeofBayesianpointestimate,isthesamplingtheoryestimator.Togeneralizetheprecedingtoanunknownσ2,wespecifyanoninformativepriordistributionforlnσovertheentirerealline.4Bythechangeofvariableformula,ifg(lnσ)isconstant,theng(σ2)isproportionalto1/σ2.5Assumingthatβandσ2areindependent,wenowhavethenoninformativejointpriordistribution:122g(β,σ)=gβ(β)gσ2(σ)∝2.σ4SeeZellner(1971)forjustificationofthispriordistribution.5Manytreatmentsofthismodeluseσratherthanσ2astheparameterofinterest.Theendresultsareidentical.Wehavechosenthisparameterizationbecauseitmakesmanipulationofthelikelihoodfunctionwithagammapriordistributionespeciallyconvenient.SeeZellner(1971,pp.44–45)fordiscussion.\nGreene-50240bookJune20,200218:2CHAPTER16✦EstimationFrameworksinEconometrics433Wecanobtainthejointposteriordistributionforβandσ2byusing12222f(β,σ|y,X)=L(β|σ,y,X)gσ2(σ)∝L(β|σ,y,X)×2.(16-7)σ2Forthesamereasonasbefore,wemultiplygσ2(σ)byawell-chosenconstant,thistime222vs(v+1)/(v+2)=vs/(v+1).Multiplying(16-5)bythisconstanttimesgσ2(σ)andinsertingh(σ2)givesthejointposteriorforβandσ2,givenyandX:2v+2v+12[vs]1−vs2(1/σ2)−K/22−1−1/2f(β,σ|y,X)∝e[2π]|σ(XX)|(v+2)σ2−(1/2)(β−b)[σ2(XX)−1]−1(β−b)×e.Toobtainthemarginalposteriordistributionforβ,itisnownecessarytointegrateσ2outofthejointdistribution(andviceversatoobtainthemarginaldistributionforσ2).Bycollectingtheterms,f(β,σ2|y,X)canbewrittenasP−121−λ(1/σ2)f(β,σ|y,X)∝A×e,σ2where[vs2]v+2A=[2π]−K/2|(XX)−1|−1/2,(v+2)P=v+2+K/2=(n−K)/2+2+K/2=(n+4)/2,andλ=vs2+1(β−b)XX(β−b),2sothemarginalposteriordistributionforβis∞∞P−1221−λ(1/σ2)2f(β,σ|y,X)dσ∝Aedσ.σ200Todotheintegration,wehavetomakeachangeofvariable;d(1/σ2)=−(1/σ2)2dσ2,sodσ2=−(1/σ2)−2d(1/σ2).Makingthesubstitution—thesignoftheintegralchangestwice,oncefortheJacobianandbackagainbecausetheintegralfromσ2=0to∞isthenegativeoftheintegralfrom(1/σ2)=0to∞—weobtain∞∞P−3221−λ(1/σ2)1f(β,σ|y,X)dσ∝Aed00σ2σ2(P−2)=A×.λP−2ReinsertingtheexpressionsforA,P,andλproduces[vs2]v+2(v+K/2)[2π]−K/2|XX|−1/2(v+2)f(β|y,X)∝1v+K/2.(16-8)vs2+(β−b)XX(β−b)2\nGreene-50240bookJune20,200218:2434CHAPTER16✦EstimationFrameworksinEconometricsThisdensityisproportionaltoamultivariatetdistribution6andisageneralizationofthefamiliarunivariatedistributionwehaveusedatvariouspoints.Thisdistributionhasadegreesoffreedomparameter,d=n−K,meanb,andcovariancematrix(d/(d−2))×[s2(XX)−1].EachelementoftheK-elementvectorβhasamarginaldistributionthatistheunivariatetdistributionwithdegreesoffreedomn−K,meanbk,andvarianceequaltothekthdiagonalelementofthecovariancematrixgivenearlier.Onceagain,thisisthesameasoursamplingtheory.Thedifferenceisamatterofinterpretation.Inthecurrentcontext,theestimateddistributionisforβandiscenteredatb.16.2.2.bPOINTESTIMATIONTheposteriordensityfunctionembodiesthepriorandthelikelihoodandthereforecontainsalltheresearcher’sinformationabouttheparameters.Butforpurposesofpresentingresults,thedensityissomewhatimprecise,andonenormallyprefersapointorintervalestimate.Thenaturalapproachwouldbetousethemeanoftheposteriordistributionastheestimator.Forthenoninformativeprior,weuseb,thesamplingtheoryestimator.Onemightaskatthispoint,whybother?TheseBayesianpointestimatesareiden-ticaltothesamplingtheoryestimates.Allthathaschangedisourinterpretationoftheresults.Thissituationis,however,exactlythewayitshouldbe.Rememberthatweenteredtheanalysiswithnoninformativepriorsforβandσ2.Therefore,theonlyinformationbroughttobearonestimationisthesampledata,anditwouldbepeculiarifanythingotherthanthesamplingtheoryestimatesemergedattheend.Theresultsdochangewhenourpriorbringsoutofsampleinformationintotheestimates,asweshallseebelow.Theresultswillalsochangeifwechangeourmotivationforestimatingβ.Theparameterestimateshavebeentreatedthusfarasiftheywereanendinthemselves.Butinsomesettings,parameterestimatesareobtainedsoastoenabletheanalysttomakeadecision.Considerthen,alossfunction,H(βˆ,β),whichquantifiesthecostofbasingadecisiononanestimateβˆwhentheparameterisβ.Theexpected,oraveragelossisEβ[H(βˆ,β)]=H(βˆ,β)f(β|y,X)dβ,(16-9)βwheretheweightingfunctionisthemarginalposteriordensity.(Thejointdensityforβandσ2wouldbeusedifthelossweredefinedoverboth.)TheBayesianpointestimateistheparametervectorthatminimizestheexpectedloss.Ifthelossfunctionisaquadraticformin(βˆ−β),thenthemeanoftheposteriordistributionisthe“minimumexpectedloss”(MELO)estimator.Theproofissimple.Forthiscase,E[H(βˆ,β)|y,X]=E1(βˆ−β)W(βˆ−β)|y,X.2Tominimizethis,wecanusetheresultthat∂E[H(βˆ,β)|y,X]/∂βˆ=E[∂H(βˆ,β)/∂βˆ|y,X]=E[−W(βˆ−β)|y,X].6See,forexample,Judgeetal.(1985)fordetails.TheexpressionappearsinZellner(1971,p.67).Notethattheexponentinthedenominatorisv+K/2=n/2.\nGreene-50240bookJune20,200218:2CHAPTER16✦EstimationFrameworksinEconometrics435Theminimumisfoundbyequatingthisderivativeto0,whence,since−Wisirrelevant,βˆ=E[β|y,X].Thiskindoflossfunctionwouldstatethaterrorsinthepositiveandnegativedirectionareequallybad,andlargeerrorsaremuchworsethansmallerrors.Ifthelossfunctionwerealinearfunctioninstead,thentheMELOestimatorwouldbethemedianoftheposteriordistribution.Theseresultsarethesameinthecaseofthenoninformativepriorthatwehavejustexamined.16.2.2.cINTERVALESTIMATIONThecounterparttoaconfidenceintervalinthissettingisanintervaloftheposteriordistributionthatcontainsaspecifiedprobability.Clearly,itisdesirabletohavethisintervalbeasnarrowaspossible.Foraunimodaldensity,thiscorrespondstoanintervalwithinwhichthedensityfunctionishigherthananypointsoutsideit,whichjustifiesthetermhighestposteriordensity(HPD)interval.Forthecasewehaveanalyzed,whichinvolvesasymmetricdistribution,wewouldformtheHPDintervalforβaroundtheleastsquaresestimateb,withterminalvaluestakenfromthestandardttables.16.2.2.dESTIMATIONWITHANINFORMATIVEPRIORDENSITYOnceweleavethesimplecaseofnoninformativepriors,mattersbecomequitecompli-cated,bothatapracticalleveland,methodologically,intermsofjustwherethepriorcomesfrom.Theintegrationofσ2outoftheposteriorin(16-5)iscomplicatedbyitself.Itismademuchmoresoifthepriordistributionsofβandσ2areatallinvolved.Partlytooffsetthesedifficulties,researchersusuallyusewhatiscalledaconjugateprior,whichisonethathasthesameformastheconditionaldensityandisthereforeamenabletotheintegrationneededtoobtainthemarginaldistributions.7SupposethatweassumethatthepriorbeliefsaboutβmaybesummarizedinaK-variatenormaldistributionwithmeanβ0andvariancematrix0.Onceagain,itisilluminatingtobeginwiththecaseinwhichσ2isassumedtobeknown.Proceedinginexactlythesamefashionasbefore,wewouldobtainthefollowingresult:Theposteriordensityofβconditionedonσ2andthedatawillbenormalwithE[β|σ2,y,X]=−1+[σ2(XX)−1]−1−1−1β+[σ2(XX)−1]−1b000(16-10)=Fβ0+(I−F)b,where−12−1−1−1−1F=0+[σ(XX)]0−1−1−1−1=[priorvariance]+[conditionalvariance][priorvariance].7Ourchoiceofnoninformativepriorforlnσledtoaconvenientpriorforσ2inourderivationoftheposteriorforβ.Theideathatthepriorcanbespecifiedarbitrarilyinwhateverformismathematicallyconvenientisverytroubling;itissupposedtorepresenttheaccumulatedpriorbeliefabouttheparameter.Ontheotherhand,itcouldbearguedthattheconjugateprioristheposteriorofapreviousanalysis,whichcouldjustifyitsform.Theissueofhowpriorsshouldbespecifiedisoneofthefocalpointsofthemethodologicaldebate.“Non-Bayesians”arguethatitisdisingenuoustoclaimthemethodologicalhighgroundandthenbasethecrucialpriordensityinamodelpurelyonthebasisofmathematicalconvenience.Inasmallsample,thisassumedpriorisgoingtodominatetheresults,whereasinalargeone,thesamplingtheoryestimateswilldominateanyway.\nGreene-50240bookJune20,200218:2436CHAPTER16✦EstimationFrameworksinEconometricsThisvectorisamatrixweightedaverageofthepriorandtheleastsquares(sample)coefficientestimates,wheretheweightsaretheinversesofthepriorandtheconditionalcovariancematrices.8Thesmallerthevarianceoftheestimator,thelargeritsweight,whichmakessense.Also,stilltakingσ2asknown,wecanwritethevarianceoftheposteriornormaldistributionas2−12−1−1−1Var[β|y,X,σ]=0+[σ(XX)].(16-11)Noticethattheposteriorvariancecombinesthepriorandconditionalvariancesonthebasisoftheirinverses.9Wemayinterpretthenoninformativepriorashavinginfiniteelementsin0.Thisassumptionwouldreducethiscasetotheearlierone.Onceagain,itisnecessarytoaccountfortheunknownσ2.Ifourprioroverσ2istobeinformativeaswell,thentheresultingdistributioncanbeextremelycumbersome.Aconjugatepriorforβandσ2thatcanbeusedis222g(β,σ)=gβ|σ2(β|σ)gσ2(σ),(16-12)202wheregβ|σ2(β|σ)isnormal,withmeanβandvarianceσAand2m+1m2mσ01−mσ2(1/σ2)gσ2(σ)=e0.(16-13)(m+1)σ2Thisdistributionisaninvertedgammadistribution.Itimpliesthat1/σ2hasagammadistribution.Thepriormeanforσ2isσ2andthepriorvarianceisσ4/(m−1).10The00productin(16-12)produceswhatiscalledanormal-gammaprior,whichisthenaturalconjugatepriorforthisformofthemodel.Byintegratingoutσ2,wewouldobtainthepriormarginalforβalone,whichwouldbeamultivariatetdistribution.11Combining(16-12)with(16-13)producesthejointposteriordistributionforβandσ2.Finally,themarginalposteriordistributionforβisobtainedbyintegratingoutσ2.Ithasbeenshownthatthisposteriordistributionismultivariatetwith2−12−1−1−12−102−1−1E[β|y,X]=[σ¯A]+[σ¯(XX)][σ¯A]β+[σ¯(XX)]b(16-14)andj2−12−1−1−1Var[β|y,X]=[σ¯A]+[σ¯(XX)],(16-15)j−2wherejisadegreesoffreedomparameterandσ¯2istheBayesianestimateofσ2.Thepriordegreesoffreedommisaparameterofthepriordistributionforσ2thatwouldhavebeendeterminedattheoutset.(Seethefollowingexample.)Onceagain,itisclear8Notethatitwillnotfollowthatindividualelementsoftheposteriormeanvectorliebetweenthoseofβ0andb.SeeJudgeetal.(1985,pp.109–110)andChamberlainandLeamer(1976).9PreciselythisestimatorwasproposedbyTheilandGoldberger(1961)asawayofcombiningapreviouslyobtainedestimateofaparameterandacurrentbodyofnewdata.Theycalledtheirresulta“mixedestimator.”Theterm“mixedestimation”takesanentirelydifferentmeaninginthecurrentliterature,aswewillseeinChapter17.10Youcanshowthisresultbyusinggammaintegrals.Notethatthedensityisafunctionof1/σ2=1/xintheformulaof(B-39),sotoobtainE[σ2],weusetheanalogofE[1/x]=λ/(P−1)andE[(1/x)2]=λ2/[(P−1)(P−2)].Inthedensityfor(1/σ2),thecounterpartstoλandParemσ2andm+1.011Fulldetailsofthis(lengthy)derivationappearinJudgeetal.(1985,pp.106–110)andZellner(1971).\nGreene-50240bookJune20,200218:2CHAPTER16✦EstimationFrameworksinEconometrics437TABLE16.1EstimatesoftheMPCYearsEstimatedMPCVarianceofbDegreesofFreedomEstimatedσ1940–19500.68480140.061878924.9541950–20000.924810.0000658654992.244thatastheamountofdataincreases,theposteriordensity,andtheestimatesthereof,convergetothesamplingtheoryresults.Example16.3BayesianEstimateoftheMarginalPropensitytoConsumeInExample3.2,anestimateofthemarginalpropensitytoconsumeisobtainedusing11obser-vationsfrom1940to1950,withtheresultsshowninthetoprowofTable16.1.Aclassical95percentconfidenceintervalforβbasedontheseestimatesis(0.8780,1.2818).(Theverywideintervalprobablyresultsfromtheobviouslypoorspecificationofthemodel.)Basedonnonin-formativepriorsforβandσ2,wewouldestimatetheposteriordensityforβtobeunivariatetwith9degreesoffreedom,withmean0.6848014andvariance(11/9)0.061878=0.075628.AnHPDintervalforβwouldcoincidewiththeconfidenceinterval.Usingthefourthquarter(yearly)valuesofthe1950–2000datausedinExample6.3,weobtainthenewestimatesthatappearinthesecondrowofthetable.Wetakethefirstestimateanditsestimateddistributionasourpriorforβandobtainaposteriordensityforβbasedonaninformativepriorinstead.Weassumeforthisexercisethatσ2maybetakenasknownatthesamplevalueof29.954.Then,−1110.924810.6848014b¯=++=0.924550.0000658650.0618780.0000658650.061878Theweightedaverageisoverwhelminglydominatedbythefarmoreprecisesamplees-timatefromthelargersample.Theposteriorvarianceistheinverseinbrackets,whichis0.000071164.Thisisclosetothevarianceofthelatterestimate.AnHPDintervalcanbeformedinthefamiliarfashion.Itwillbeslightlynarrowerthantheconfidenceinterval,sincethevarianceoftheposteriordistributionisslightlysmallerthanthevarianceofthesamplingestimator.Thisreductionisthevalueofthepriorinformation.(Asweseehere,thepriorisnotparticularlyinformative.)16.2.2.eHYPOTHESISTESTINGTheBayesianmethodologytreatstheclassicalapproachtohypothesistestingwithalargeamountofskepticism.Twoissuesareespeciallyproblematic.First,acloseex-aminationofonlytheworkwehavedoneinChapter6willshowthatbecauseweareusingconsistentestimators,withalargeenoughsample,wewillultimatelyrejectany(nested)hypothesisunlessweadjustthesignificancelevelofthetestdownwardasthesamplesizeincreases.Second,theall-or-nothingapproachofeitherrejectingornotrejectingahypothesisprovidesnomethodofsimplysharpeningourbeliefs.Eventhemostcommittedofanalystsmightbereluctanttodiscardastronglyheldpriorbasedonasinglesampleofdata,yetthisiswhatthesamplingmethodologymandates.(Note,forexample,theuncomfortabledilemmathiscreatesinfootnote24inChapter14.)TheBayesianapproachtohypothesistestingismuchmoreappealinginthisregard.Indeed,theapproachmightbemoreappropriatelycalled“comparinghypotheses,”sinceites-sentiallyinvolvesonlymakinganassessmentofwhichoftwohypotheseshasahigherprobabilityofbeingcorrect.\nGreene-50240bookJune20,200218:2438CHAPTER16✦EstimationFrameworksinEconometricsTheBayesianapproachtohypothesistestingbearslargesimilaritytoBayesianestimation.12Wehaveformulatedtwohypotheses,a“null,”denotedH,andanalter-0native,denotedH1.Theseneednotbecomplementary,asinH0:“statementAistrue”versusH1:“statementAisnottrue,”sincetheintentoftheprocedureisnottorejectonehypothesisinfavoroftheother.Forsimplicity,however,wewillconfineourat-tentiontohypothesesabouttheparametersintheregressionmodel,whichoftenarecomplementary.Assumethatbeforewebeginourexperimentation(datagathering,statisticalanalysis)weareabletoassignpriorprobabilitiesP(H0)andP(H1)tothetwohypotheses.TheprioroddsratioissimplytheratioP(H0)Oddsprior=.(16-16)P(H1)Forexample,one’suncertaintyaboutthesignofaparametermightbesummarizedinaprioroddsoverH0:β≥0versusH1:β<0of0.5/0.5=1.Afterthesampleevidenceisgathered,thepriorwillbemodified,sotheposterioris,ingeneral,Oddsposterior=B01×Oddsprior.ThevalueB01iscalledtheBayesfactorforcomparingthetwohypotheses.Itsummarizestheeffectofthesampledataonthepriorodds.Theendresult,Oddsposterior,isanewoddsratiothatcanbecarriedforwardasthepriorinasubsequentanalysis.TheBayesfactoriscomputedbyassessingthelikelihoodsofthedataobservedunderthetwohypotheses.Wereturntoourfirstdeparturepoint,thelikelihoodofthedata,giventheparameters:22−n/2(−1/(2σ2))(y−Xβ)(y−Xβ)f(y|β,σ,X)=[2πσ]e.(16-17)Basedonourpriorsfortheparameters,theexpected,oraveragelikelihood,assumingthathypothesisjistrue(j=0,1),is2222f(y|X,Hj)=Eβ,σ2[f(y|β,σ,X,Hj)]=f(y|β,σ,X,Hj)g(β,σ)dβdσ.σ2β(Thisconditionaldensityisalsothepredictivedensityfory.)Therefore,basedontheobserveddata,weuseBayestheoremtoreassesstheprobabilityofHj;theposteriorprobabilityisf(y|X,Hj)P(Hj)P(Hj|y,X)=.f(y)TheposterioroddsratioisP(H0|y,X)/P(H1|y,X),sotheBayesfactorisf(y|X,H0)B01=.f(y|X,H1)Example16.4PosteriorOddsfortheClassicalRegressionModelZellner(1971)analyzesthesettinginwhichtherearetwopossibleexplanationsforthevariationinadependentvariabley:Model0:y=x0β0+ε0andModel1:y=x1β1+ε1.12Forextensivediscussion,seeZellnerandSiow(1980)andZellner(1985,pp.275–305).\nGreene-50240bookJune20,200218:2CHAPTER16✦EstimationFrameworksinEconometrics439Wewillbrieflysketchhisresults.Weforminformativepriorsfor[β,σ2],j=0,1,asspec-jifiedin(16-12)and(16-13),thatis,multivariatenormalandinvertedgamma,respectively.ZellnerthenderivestheBayesfactorfortheposterioroddsratio.Thederivationislengthyandcomplicated,butforlargen,withsomesimplifyingassumptions,ausefulformulationemerges.First,assumethatthepriorsforσ2andσ2arethesame.Second,assumethat01−1−1−1−1[|A0|/|A0+X0X0|]/[|A1|/|A1+X1X1|]→1.Thefirstofthesewouldbetheusualsituation,inwhichtheuncertaintyconcernsthecovariationbetweenyiandxi,nottheamountofresid-ualvariation(lackoffit).Thesecondconcernstherelativeamountsofinformationintheprior(A)versusthelikelihood(XX).Thesematricesaretheinversesofthecovariancematrices,ortheprecisionmatrices.[Notehowthesetwomatricesformthematrixweightsinthecomputationoftheposteriormeanin(16-10).]Zellner(p.310)discussesthisassumptionatsomelength.Withthesetwoassumptions,heshowsthatasngrowslarge,132−(n+m)/22−(n+m)/2s1−R00B01≈=.s21−R211Therefore,theresultfavorsthemodelthatprovidesthebetterfitusingR2asthefitmeasure.IfwestretchZellner’sanalysisabitbyinterpretingmodel1as“themodel”andmodel0as“nomodel”(i.e.,therelevantpartofβ=0,soR2=0),thentheratiosimplifiesto00(n+m)/22B01=1−R0.Thus,thebetterthefitoftheregression,thelowertheBayesfactorinfavorofmodel0(nomodel),whichmakesintuitivesense.ZellnerandSiow(1980)havecontinuedthisanalysiswithnoninformativepriorsforβandσ2.Specifically,theyusetheflatpriorforlnσ[see(16-7)]andamultivariateCauchypriorj(whichhasinfinitevariances)forβ.Theirmainresult(3.10)is1√k/2πn−K22(n−K−1)/2B01=(1−R).[(k+1)/2]2Thisresultisverymuchlikethepreviousone,withsomeslightdifferencesduetodegreesoffreedomcorrectionsandtheseveralapproximationsusedtoreachthefirstone.16.2.3USINGBAYESTHEOREMINACLASSICALESTIMATIONPROBLEM:THELATENTCLASSMODELLatentclassmodelingcanbeviewedasameansofmodelingheterogeneityacrossindi-vidualsinarandomparametersframework.WefirstencounteredrandomparametersmodelsinSection13.8inconnectionwithpaneldata.14Asweshallsee,thelatentclassmodelprovidesaninterestinghybridofclassicalandBayesiananalysis.Todefinethelatentclassmodel,webeginwitharandomparametersformulationofthedensityofanobservedrandomvariable.Wewillassumethatthedataareapanel.Thus,theden-sityofyitwhentheparametervectorisβiisf(yit|xit,βi).Theparametervectorβiisrandomlydistributedoverindividualsaccordingtoβi=β+zi+vi(16-18)andwhereβ+ziisthemeanofthedistribution,whichdependsontimeinvariantindividualcharacteristicsaswellasparametersyettobeestimated,andtherandom13AratioofexponentialsthatappearsinZellner’sresult(hisequation10.50)isomitted.Totheorderofapproximationintheresult,thisratiovanishesfromthefinalresult.(PersonalcorrespondencefromA.Zellnertotheauthor.)14Inprinciple,thelatentclassmodeldoesnotrequirepaneldata,butpracticalexperiencesuggeststhatitdoesworkbestwhenindividualsareobservedmorethanonceandisdifficulttoimplementinacrosssection.\nGreene-50240bookJune20,200218:2440CHAPTER16✦EstimationFrameworksinEconometricsvariationcomesfromtheindividualheterogeneity,vi.Thisrandomvectorisassumedtohavemeanzeroandcovariancematrix,.Theconditionaldensityoftheparametersisg(βi|zi,β,,)=g(vi+β+zi,),whereg(.)istheunderlyingmarginaldensityoftheheterogeneity.Theunconditionaldensityforyitisobtainedbyintegratingovervi,f(yit|xit,zi,β,,)=Eβi[f(yit|xit,βi)]=f(yit|xit,βi)g(vi+β+zi,)dvi.viThisresultwouldprovidethedensitythatwouldenterthelikelihoodfunctionforesti-mationofthemodelparameters.WewillreturntothismodelformulationinChapter17.Theprecedinghasassumedβihasacontinuousdistribution.SupposethatβiisgeneratedfromadiscretedistributionwithJvalues,orclasses,sothatthedistributionofβisovertheseJvectors.15Thus,themodelstatesthatanindividualbelongstooneoftheJlatentclasses,butitisunknownfromthesampledataexactlywhichone.Wewillusethesampledatatoestimatetheprobabilitiesofclassmembership.ThecorrespondingmodelformulationisnowJf(yit|xit,zi,)=pij(,zi)f(yit|xit,βj)j=1whereitremainstoparameterizetheclassprobabilities,pijandthestructuralmodel,f(yit|xit,βj).Thematrixcontainstheparametersofthediscretedistribution.IthasJrows(oneforeachclass)andMcolumnsfortheMvariablesinzi.(Thestructuralmeanandvarianceparametersβandarenolongernecessary.)Ataminimum,M=1andzicontainsaconstant,iftheclassprobabilitiesarefixedparameters.Finally,inordertoaccommodatethepaneldatanatureofthesamplingsituation,wesupposethatconditionedonβj,observationsyit,t=1,...,Tareindependent.Therefore,foragroupofTobservations,thejointdensityisTf(yi1,yi2,...,yiT|βj,xi1,xi2,...,xiT)=f(yit|xit,βj).t=1(WewillconsidermodelsthatprovidecorrelationacrossobservationsinChapters17and21.)Insertingthisresultintheearlierdensityproducesthelikelihoodfunctionforapanelofdata,nMTlnL=lnpij(,zi)g(yit|xit,βj).i=1j=1t=1Theclassprobabilitiesmustbeconstrainedtosumto1.Asimpleapproachistoreparameterizethemasasetoflogitprobabilities,eθijpij=Jθ,j=1,...,J,θiJ=0,θij=δjzi,(δJ=0).(16-19)eijj=1(SeeSection21.8fordevelopmentofthismodelforasetofprobabilities.)Notethere-strictiononθiJ.Thisisanidentificationrestriction.Withoutit,thesamesetof15Onecanviewthisasadiscreteapproximationtothecontinuousdistribution.ThisisalsoanextensionofHeckmanandSinger’s(1984b)modeloflatentheterogeneity,buttheinterpretationisabitdifferenthere.\nGreene-50240bookJune20,200218:2CHAPTER16✦EstimationFrameworksinEconometrics441probabilitieswillariseifanarbitraryvectorisaddedtoeveryδj.Theresultingloglikelihoodisacontinuousfunctionoftheparametersβ1,...,βJandδ1,...,δJ.Forallitsapparentcomplexity,estimationofthismodelbydirectmaximizationoftheloglikelihoodisnotespeciallydifficult.[SeeSectionE.5andGreene(2001).]Thenumberofclassesthatcanbeidentifiedislikelytoberelativelysmall(ontheorderoffiveorless),however,whichisviewedasadrawbackofthisapproach,and,ingeneral,(asmightbeexpected),thelessrichisthepaneldatasetintermsofcrossgroupvariation,themoredifficultitistoestimatethismodel.Estimationproducesvaluesforthestructuralparameters,(βj,δj),j=1,...,J.Withtheseinhand,wecancomputethepriorclassprobabilities,pijusing(16-20).Forpredictionpurposes,onemightbemoreinterestedintheposterior(onthedata)classprobabilities,whichwecancomputeusingBayestheoremasf(observationi|classj)Prob(classj)Prob(classj|observationi)=Jf(observationi|classj)Prob(classj)j=1f(yi1,yi2,...,yiT|xi1,xi2,...,xiT,βj)pij(,zi)=Mj=1f(yi1,yi2,...,yiT|xi1,xi2,...,xiT,βj)pij(,zi)=wij.Thissetofprobabilities,wi=(wi1,wi2,...,wiJ)givestheposteriordensityoverthedistributionofvaluesofβ,thatis,[β1,β2,...,βJ].TheBayesianestimatorofthe(individualspecific)parametervectorwouldbetheposteriormeanJβˆpβˆi=Eˆj[βj|observationi]=wijj.j=1Example16.5ApplicationsoftheLatentClassModelThelatentclassformulationhasprovidedanattractiveplatformformodelinglatenthetero-geneity.(SeeGreene(2001)forasurvey.)Fortwoexamples,NaginandLand(1993)employedthemodeltostudyagetransitionsthroughstagesofcriminalcareersandWangetal.(1998)andWedeletal.(1993)andusedthePoissonregressionmodeltostudycountsofpatents.Toillustratetheestimator,wewillapplythelatentclassmodeltothepaneldatabinarychoiceapplicationoffirmproductinnovationsstudiedbyBertschekandLechner(1998).16Theyanalyzedthedependentvariableyit=1iffirmirealizedaproductinnovationinyeartand0ifnot.Thus,thisisabinarychoicemodel.(SeeSection21.2foranalysisofbinarychoicemodels.)Thesampleconsistsof1270Germanmanufacturingfirmsobservedforfiveyears,1984–1988.Independentvariablesinthemodelthatweformulatedwerexit1=constant,xit2=logofsales,xit3=relativesize=ratioofemploymentinbusinessunittoemploymentintheindustry,xit4=ratioofindustryimportsto(industrysales+imports),xit5=ratioofindustryforeigndirectinvestmentto(industrysales+imports),16Wearegratefultotheauthorsofthisstudywhohavegenerouslyloanedustheirdataforthisanalysis.Thedataareproprietaryandcannotbemadepubliclyavailableasaretheotherdatasetsusedinourexamples.\nGreene-50240bookJune20,200218:2442CHAPTER16✦EstimationFrameworksinEconometricsTABLE16.2EstimatedLatentClassModelProbitClass1Class2Class3PosteriorConstant−1.96−2.32−2.71−8.97−3.38(0.23)(0.59)(0.69)(2.20)(2.14)lnSales0.180.320.230.570.34(0.022)(0.061)(0.072)(0.18)(0.09)Rel.Size1.074.380.721.422.58(0.14)(0.89)(0.37)(0.76)(1.30)Import1.130.942.263.121.81(0.15)(0.37)(0.53)(1.38)(0.74)FDI2.852.202.818.373.63(0.40)(1.16)(1.11)(1.93)(1.98)Prod.−2.34−5.86−7.70−0.91−5.48(0.72)(2.70)(4.69)(6.76)(1.78)RawMtls−0.28−0.11−0.600.86−0.08(0.081)(0.24)(0.42)(0.70)(0.37)Invest.0.190.130.410.470.29(0.039)(0.11)(0.12)(0.26)(0.13)lnL−4114.05−3503.55ClassProb.0.4690.3310.200(Prior)(0.0352)(0.0333)(0.0246)ClassProb.0.4690.3310.200(Posterior)(0.394)(0.289)(0.325)Pred.Count649366255xit6=productivity=ratioofindustryvalueaddedtoindustryemployment,xit7=dummyvariableindicatingfirmisintherawmaterialssector,xit8=dummyvariableindicatingfirmisintheinvestmentgoodssector.Discussionofthedatasetmaybefoundinthearticle(pp.331–332and370).Ourcentralmodelforthebinaryoutcomeisaprobitmodel,f(yit|xit,βj)=Prob[yit|xitβj]=[(2yit−1)xitβj],yit=0,1.Thisisthespecificationusedbytheauthors.Wehaveretaineditsowecancomparetheresultsofthevariousmodels.Wealsofitamodelwithyearspecificdummyvariablesinsteadofasingleconstantandwiththeindustrysectordummyvariablesmovedtothelatentclassprobabilityequation.SeeGreene(2002)foranalysisofthedifferentspecifications.EstimatesofthemodelparametersarepresentedinTable16.2.The“probit”coefficientsinthefirstcolumnarethosepresentedbyBertschekandLechner.17Theclassspecificparam-eterestimatescannotbecompareddirectly,asthemodelsarequitedifferent.Theestimatedposteriormeanshown,whichiscomparabletotheoneclassresultsisthesampleaverageandstandarddeviationofthe1,270firmspecificposteriormeanparametervectors.Theydifferconsiderablyfromtheprobitmodel,butineachcase,aconfidenceintervalaroundtheposteriormeancontainstheprobitestimator.Finally,the(identical)priorandaverageofthesampleposteriorclassprobabilitiesareshownatthebottomofthetable.Themuchlargerempiricalstandarddeviationsreflectthattheposteriorestimatesarebasedonaggregatingthesampledataandinvolve,aswell,complicatedfunctionsofallthemodelparameters.Theestimatednumbersofclassmembersarecomputedbyassigningtoeachfirmthepredicted17Theauthorsusedtherobust“sandwich”estimatorforthestandarderrors—seeSection17.9—ratherthantheconventionalnegativeinverseoftheHessian.\nGreene-50240bookJune20,200218:2CHAPTER16✦EstimationFrameworksinEconometrics443classassociatedwiththehighestposteriorclassprobability.Finally,toexplorethedifferencebetweentheprobitmodelandthelatentclassmodel,wehavecomputedtheprobabilityofaproductinnovationatthefive-yearmeanoftheindependentvariablesforeachfirmusingtheprobitestimatesandthefirmspecificposteriormeanestimatedcoefficientvector.ThetwokerneldensityestimatesshowninFigures16.1and16.2(seeSection16.4.1)showtheeffectofallowingthegreaterbetweenfirmvariationinthecoefficientvectors.FIGURE16.1ProbitProbabilities.KernelDensityEstimateforPPR3.302.641.98Density1.320.660.00.0.2.4.6.81.01.2PPRFIGURE16.2LatentClassProbabilities.KernelDensityEstimateforPLC1.601.280.96Density0.640.320.00.2.0.2.4.6.81.01.2PLC\nGreene-50240bookJune20,200218:2444CHAPTER16✦EstimationFrameworksinEconometrics16.2.4HIERARCHICALBAYESESTIMATIONOFARANDOMPARAMETERSMODELBYMARKOVCHAINMONTECARLOSIMULATIONWenowconsideraBayesianapproachtoestimationoftherandomparametersmodelin(16-19).Foranindividuali,theconditionaldensityforthedependentvariableinperiodtisf(yit|xit,βi)whereβiistheindividualspecificK×1parametervectorandxisindividualspecificdatathatentertheprobabilitydensity.18ForthesequenceofitTobservations,assumingconditional(onβi)independence,personi’scontributiontothelikelihoodforthesampleisTf(yi|Xi,βi)=f(yit|xit,βi).(16-20)t=1whereyi=(yi1,...,yiT)andXi=[xi1,...,xiT].Wewillsupposethatβiisdistributednormallywithmeanβandcovariancematrix.(Thisisthe“hierarchical”aspectofthemodel.)Theunconditionaldensitywouldbetheexpectedvalueoverthepossiblevaluesofβi;Tf(yi|Xi,β,)=f(yit|xit,βi)φK[βi|β,]dβi(16-21)βit=1whereφK[βi|β,]denotestheKvariatenormalpriordensityforβigivenβand.Maximumlikelihoodestimationofthismodel,whichentailsestimationofthe“deep”parameters,β,,thenestimationoftheindividualspecificparameters,βiusingthesamemethodweusedforthelatentclassmodel,isconsideredinSection17.8.Fornow,weconsidertheBayesianapproachtoestimationoftheparametersofthismodel.ToapproachthisfromaBayesianviewpoint,wewillassignnoninformativepriordensitiestoβand.Asisconventional,weassignaflat(noninformative)priortoβ.Thevarianceparametersaremoreinvolved.Ifitisassumedthattheelementsofβiareconditionallyindependent,theneachelementofthe(now)diagonalmatrixmaybeassignedtheinvertedgammapriorthatweusedin(16-14).AfullmatrixishandledbyassigningtoaninvertedWishartpriordensitywithparametersscalarKandmatrixK×I.[TheWishartdensityisamultivariatecounterparttotheChi-squareddistribution.DiscussionmaybefoundinZellner(1971,pp.389–394).]Thisproducesthejointposteriordensity,nT(β1,...,βn,β,|alldata)=f(yit|xit,βi)φK[βi|β,]×p(β,).i=1t=1(16-22)Thisgivesthejointdensityofalltheunknownparametersconditionedontheobserveddata.OurBayesianestimatorsoftheparameterswillbetheposteriormeansforthese(n+1)K+K(K+1)/2parameters.Inprinciple,thisrequiresintegrationof(16-23)withrespecttothecomponents.Asonemightguessatthispoint,thatintegrationishopelesslycomplexandnotremotelyfeasible.Itisatthispointthattherecently18Inordertoavoidalayerofcomplication,wewillembedthetimeinvarianteffectziinxβ.Afulltreatmentitinthesamefashionasthelatentclassmodelwouldbesubstantiallymorecomplicatedinthissetting(thoughitisquitestraightforwardinthemaximumsimulatedlikelihoodapproachdiscussedinSection17.8).\nGreene-50240bookJune20,200218:2CHAPTER16✦EstimationFrameworksinEconometrics445developedtechniquesofMarkovChainMonteCarlo(MCMC)simulationestimationandtheMetropolisHastingsalgorithmenterandenableustodotheestimationinaremarkablysimplefashion.TheMCMCproceduremakesuseofaresultthatwehaveemployedatmanypointsintheprecedingchapters.Thejointdensityin(16-23)isexceedinglycomplex,andbruteforceintegrationisnotfeasible.Suppose,however,thatwecoulddrawran-domsamplesof[β1,...,βn,β,]fromthispopulation.Then,samplestatisticssuchasmeanscomputedfromtheserandomdrawswouldconvergetothemomentsoftheunderlyingpopulation.ThelawsoflargenumbersdiscussedinAppendixDwouldapply.Thatpartiallysolvestheproblem.Thedistributionremainsascomplexasbe-fore,however,sohowtodrawthesampleremainstobesolved.TheGibbssamplerandtheMetropolis—Hastingsalgorithmcanbeusedforsamplingfromthe(hopelesslycomplex)jointdensity,(β1,...,βn,β,|alldata).ThebasicprincipleoftheGibbssamplerisdescribedinSectionE2.6.Thecoreresultisasfollows:Foratwo-variablecase,f(x,y)inwhichf(x|y)andf(y|x)areknown.A“Gibbssequence”ofdraws,y0,x0,y1,x1,y2,...,yM,xM,isgeneratedasfollows.First,y0isspecified“manually.”Thenx0isobtainedasarandomdrawfromthepopulationf(x|y0).Theny1isdrawnfromf(y|x0),andsoon.Theiterationis,generically,asfollows.1.Drawxjfromf(x|yj).2.Drawyj+1fromf(y|xj).3.Exitorreturntostep1.Ifthisprocessisrepeatedenoughtimes,thenatthelaststep,(xj,yj)togetherareadrawfromthejointdistribution.Train(2001and2002,Chapter12)describeshowtousetheseresultsforthisrandomparametersmodel.19Theusefulnessofthisresultforourcurrentproblemisthatitis,indeed,possibletopartitionthejointdistribution,andwecaneasilysamplefromtheconditionaldistributions.Webeginbypartitioningtheparametersintoγ=(β,)andδ=(β1,...,βn).Trainproposesthefollowingstrategy:Toobtainadrawfromγ|δ,wewillusetheGibbssamplertoobtainadrawfromthedistributionof(β|,δ)thenonefromthedistributionof(|β,δ).Wewilllaythisoutfirst,thenturntosamplingfromδ|β,.Conditionedonδand,βhasaK-variatenormaldistributionwithmeanβ¯=n(1/n)i=1βiandcovariancematrix(1/n).TosamplefromthisdistributionwewillfirstobtaintheCholeskyfactorizationof=LLwhereLisalowertriangularmatrix.[SeeSectionA.7.11.]LetvbeavectorofKdrawsfromthestandardnormaldistribution.Then,β¯+Lvhasmeanvectorβ¯+L×0=β¯andcovariancematrixLIL=whichisexactlywhatweneed.So,thisshowshowtosampleadrawfromtheconditionaldistributionofβ.Toobtainarandomdrawfromthedistributionof|β,δ,wewillrequirearandomdrawfromtheinvertedWishartdistribution.Themarginalposteriordistributionof|β,δisinvertedWishartwithparametersscalarK+nandmatrixW=(KI+nV)19Traindescribesuseofthismethodfor“mixedlogit”models.Bywritingthedensitiesingenericform,wehaveextendedhisresulttoanygeneralsettingthatinvolvesaparametervectorinthefashiondescribedabove.InSection17.8,wewillapplythismodeltotheprobitmodelconsideredinthelatentclassmodelinExample16.5.\nGreene-50240bookJune20,200218:2446CHAPTER16✦EstimationFrameworksinEconometricsnwhereV=(1/n)i=1(βi−β¯)(βi−β¯).Train(2001)suggeststhefollowingstrategyforsamplingamatrixfromthisdistribution:LetMbethelowertriangularCholeskyfactorofW−1,soMM=W−1.ObtainK+ndrawsofv=Kstandardnormalvariates.kThen,obtainS=M(K+nvv)M.Then,j=S−1isadrawfromtheinvertedWishartk=1kkdistribution.[Thisisfairlystraightforward,asitinvolvesonlyrandomsamplingfromthestandardnormaldistribution.Foradiagonalmatrix,thatis,uncorrelatedparametersinβi,itsimplifiesabitfurther.Adrawforthenonzerokthdiagonalelementcanbeobtainedusing(1+nV)/K+nv2.]kkr=1rkThedifficultstepissamplingβi.Forthisstep,weusetheMetropolis–Hastings(M-H)algorithmsuggestedbyChibandGreenberg(1996)andGelmanetal.(1995).Theprocedureinvolvesthefollowingsteps:1.Givenβandand“tuningconstant”τ(tobedescribedbelow),computed=τLvwhereListheCholeskyfactorizationofandvisavectorofKindependentstandardnormaldraws.2.Createatrialvalueβi1=βi0+dwhereβi0isthepreviousvalue.3.Theposteriordistributionforβiisthelikelihoodthatappearsin(16-21)timesthejointnormalpriordensity,φK[βi|β,].Evaluatethisposteriordensityatthetrialvalueβi1andthepreviousvalueβi0.Letf(yi|Xi,βi1)φK(βi1|β,)R10=.f(yi|Xi,βi0)φK(βi0|β,)4.Drawoneobservation,u,fromthestandarduniformdistribution,U[0,1].5.Ifu0]whereiiiεi∼N[0,1].Theestimatorofβunderthisspecificationwillbeinconsistentifthedistributionisnotnormalorifεiisheteroscedastic.Lewbelsuggeststhefollowing:If(a)itcanbeassumedthatxicontainsa“special”variable,vi,whosecoefficienthasaknownsign—amethodisdevelopedfordeterminingthesignand(b)thedensityofεiisindependentofthisvariable,thenaconsistentestimatorofβcanbeobtainedbylinearregressionof[yi−s(vi)]/f(vi|xi)onxiwheres(vi)=1ifvi>0and0otherwiseandf(vi|xi)isakerneldensityestimatorofthedensityofvi|xi.Lewbel’sestimatorisrobusttoheteroscedasticityanddistribution.Amethodisalsosuggestedforestimatingthedistributionofεi.NotethatLewbel’sestimatorissemiparametric.Hisunderlyingmodelisafunctionoftheparametersβ,butthedistributionisunspecified.16.4NONPARAMETRICESTIMATIONResearchershavelongheldreservationsaboutthestrongassumptionsmadeinpara-metricmodelsfitbymaximumlikelihood.Thelinearregressionmodelwithnormaldisturbancesisaleadingexample.Splines,translogmodels,andpolynomialsallrepre-sentattemptstogeneralizethefunctionalform.Nonetheless,questionsremainabouthowmuchgeneralitycanbeobtainedwithsuchapproximations.Thetechniquesofnon-parametricestimationdiscardessentiallyallfixedassumptionsaboutfunctionalformanddistribution.Giventheirverylimitedstructure,itfollowsthatnonparametricspec-ificationsrarelyprovideverypreciseinferences.Thebenefitisthatwhatinformationisprovidedisextremelyrobust.Thecenterpieceofthissetoftechniquesisthekerneldensityestimatorthatwehaveusedintheprecedingexamples.Wewillexaminesomeexamples,thenexamineanapplicationtoabivariateregression.2616.4.1KERNELDENSITYESTIMATIONSamplestatisticssuchasamean,variance,andrangegivesummaryinformationaboutthevaluesthatarandomvariablemaytake.But,theydonotsufficetoshowthedistribu-tionofvaluesthattherandomvariabletakes,andthesemaybeofinterestaswell.Thedensityofthevariableisusedforthispurpose.Afullyparametricapproachtodensityestimationbeginswithanassumptionabouttheformofadistribution.Estimationofthedensityisaccomplishedbyestimationoftheparametersofthedistribution.Totakethecanonicalexample,ifwedecidethatavariableisgeneratedbyanormaldistributionwithmeanµandvarianceσ2,thenthedensityisfullycharacterizedbytheseparameters.Itfollowsthat2111x−µˆfˆ(x)=f(x|µ,ˆσˆ2)=√exp−.σˆ2π2σˆOnemaybeunwillingtomakeanarrowdistributionalassumptionaboutthedensity.Theusualapproachinthiscaseistobeginwithahistogramasadescriptivedevice.Consider26Thereisalargeandrapidlygrowingliteratureinthisareaofeconometrics.TwomajorreferenceswhichprovideanappliedandtheoreticalfoundationareHardle(1990)andPaganandUllah(1999).¨\nGreene-50240bookJune20,200218:2454CHAPTER16✦EstimationFrameworksinEconometricsHistogramforVariableBSALES324243162Frequency810.236.283.330.377.424.471.518.565BSALESFIGURE16.4HistogramforEstimatedCoefficients.anexample.InExample16.5,weestimatedamodelthatproducedaposteriorestimatorofaslopevectorforeachofthe1,270firmsinoursample.Wemightbeinterestedinthedistributionoftheseestimatorsacrossfirms.Inparticular,theposteriorestimatesoftheestimatedslopeonlnsalesforthe1,270firmshaveasamplemeanof0.3428,astandarddeviationof0.08919,aminimumof0.2361andamaximumof0.5664.Thistellsuslittleaboutthedistributionofvalues,thoughthefactthatthemeaniswellbelowthemidrangeof.4013mightsuggestsomeskewness.ThehistograminFigure16.4ismuchmorerevealing.Basedonwhatweseethusfar,anassumptionofnormalitymightnotbeappropriate.Thedistributionseemstobebimodal,butcertainlynoparticularfunctionalformseemsnatural.Thehistogramisacrudedensityestimator.Therectanglesinthefigurearecalledbins.Byconstruction,theyareofequalwidth.(Theparametersofthehistogramarethenumberofbins,thebinwidthandtheleftmoststartingpoint.Eachisimportantintheshapeoftheendresult.)Sincethefrequencycountinthebinssumstothesamplesize,bydividingeachbyn,wehaveadensityestimatorthatsatisfiesanobviousrequirementforadensity;itsums(integrates)toone.Wecanformalizethisbylayingoutthemethodbywhichthefrequenciesareobtained.Letxkbethemidpointofthekthbinandlethbethewidthofthebin—wewillshortlyrenamehtobethebandwidthforthedensityestimator.Thedistancetotheleftandrightboundariesofthebinsareh/2.Thefrequencycountineachbinisthenumberofobservationsinthesamplewhichfallintherangexk±h/2.Collectingterms,wehaveour“estimator”n1frequencyinbinx11hhfˆ(x)==1x−qµ(µ,θ)ifθ=θ(µ),theniftheparameterspaceiscompact,theparametervectorisidentifiedbythecriterionfunction.Wehavenotassumedcompactness.Foraconvexparameterspace,wewouldrequiretheadditionalconditionthatthereexistnosequencesmmwithoutlimitpointsθsuchthatq(µ,θ)convergestoq(µ,θ(µ)).Theapproachtakenhereistoassumefirstthatthemodelhassomesetofparameters.Theidentifiabilitycriterionstatesthatassumingthisisthecase,theprobabilitylimitofthecriterionismaximizedattheseparameters.Thisresultrestsonconvergenceofthecriterionfunctiontoafinitevalueatanypointintheinterioroftheparameterspace.Sincethecriterionfunctionisafunctionofthedata,thisconvergencerequiresastatementofthepropertiesofthedata—e.g.,wellbehavedinsomesense.Leavingthatasideforthemoment,interestingly,theresultstothis27IntheirExercise23.6,Griffiths,Hill,andJudge(1993),based(alas)onthefirsteditionofthistext,suggestaprobitmodelforstatewidevotingoutcomesthatincludesdummyvariablesforregion,Northeast,Southeast,West,andMountain.Onewouldnormallyincludethreeofthefourdummyvariablesinthemodel,butGriffithsetal.carefullydroppedtwoofthembecauseinadditiontothedummyvariabletrap,theSoutheastvariableisalwayszerowhenthedependentvariableiszero.Inclusionofthisvariableproducesanonconcavelikelihoodfunction—theparameteronthisvariablediverges.Analysisofacloselyrelatedcaseappearsasacaveatonpage272ofAmemiya(1985).\nGreene-50240bookJune20,200218:2CHAPTER16✦EstimationFrameworksinEconometrics463pointalreadyestablishtheconsistencyoftheMestimator.Inwhatmightseemtobeanextremelytersefashion,Amemiya(1985)definedidentifiabilitysimplyas“existenceofaconsistentestimator.”WeseethatidentificationandtheconditionsforconsistencyoftheMestimatoraresubstantivelythesame.Thisformofidentificationisnecessary,intheory,toestablishtheconsistencyarguments.Inanybutthesimplestcases,however,itwillbeextremelydifficulttoverifyinpractice.Fortunately,therearesimplerwaystosecureidentificationthatwillappealmoretotheintuition:•Fortheleastsquaresestimator,asufficientconditionforidentificationisthatanytwodifferentparametervectors,θandθ0mustbeabletoproducedifferentvaluesoftheconditionalmeanfunction.Thismeansthatforanytwodifferentparametervectors,theremustbeanxiwhichproducesdifferentvaluesoftheconditionalmeanfunction.Youshouldverifythatforthelinearmodel,thisisthefullrankassumptionA.2.Forthemodelinexample2.5,wehavearegressioninwhichx2=x3+x4.Inthiscase,anyparametervectoroftheform(β1,β2−a,β3+a,β4+a)producesthesameconditionalmeanas(β1,β2,β3,β4)regardlessofxi,sothismodelisnotidentified.Thefullrankassumptionisneededtoprecludethisproblem.Fornonlinearregressions,theproblemismuchmorecomplicated,andthereisnosimplegenerality.Example9.2showsanonlinearregressionmodelthatisnotidentifiedandhowthelackofidentificationisremedied.•Forthemaximumlikelihoodestimator,aconditionsimilartothatfortheregressionmodelisneeded.Foranytwoparametervectors,θ=θ0itmustbepossibletoproducedifferentvaluesofthedensityf(yi|xi,θ)forsomedatavector(yi,xi).Manyeconometricmodelsthatarefitbymaximumlikelihoodare“indexfunction”modelsthatinvolvedensitiesoftheformf(y|x,θ)=f(y|xθ).Whenthisistheiiiicase,thesamefullrankassumptionthatappliestotheregressionmodelmaybesufficient.(Iftherearenootherparametersinthemodel,thenitwillbesufficient.)•FortheGMMestimator,notmuchsimplicitycanbegained.AsufficientconditionforidentificationisthatE[m¯(data,θ)]=0ifθ=θ0.(d)Behaviorofthedatahasbeendiscussedatvariouspointsintheprecedingtext.Theestimatorsarebasedonmeansoffunctionsofobservations.(Youcanseethisinallthreeofthedefinitionsabove.Derivativesofthesecriterionfunctionswilllikewisebemeansoffunctionsofobservations.)AnalysisoftheirlargesamplebehaviorswillturnondeterminingconditionsunderwhichcertainsamplemeansoffunctionsofobservationswillbesubjecttolawsoflargenumberssuchastheKhinchine(D.5.)orChebychev(D.6)theorems,andwhatmustbeassumedinordertoassertthat“root-n”timessamplemeansoffunctionswillobeycentrallimittheoremssuchastheLindberg–Feller(D.19)orLyapounov(D.20)theoremsforcrosssectionsortheMartingaleDifferenceCentralLimitTheoremfordependentobservations.Ultimately,thisistheissueinestablishingthestatisticalproperties.Theconvergencepropertyclaimedabovemustoccurinthecontextofthedata.TheseconditionshavebeendiscussedinSection5.2andinSection10.2.2undertheheadingof“wellbehaveddata.”Atthispoint,wewillassumethatthedataarewellbehaved.\nGreene-50240bookJune20,200218:2464CHAPTER16✦EstimationFrameworksinEconometrics16.5.4ASYMPTOTICPROPERTIESOFESTIMATORSWithallthisapparatusinplace,thefollowingarethestandardresultsonasymptoticpropertiesofMestimators:THEOREM16.1ConsistencyofMEstimatorsIf(a)theparameterspaceisconvexandthetrueparametervectorisapointinitsinterior;(b)thecriterionfunctionisconcave;(c)theparametersareidentifiedbythecriterionfunction;(d)thedataarewellbehaved,thentheMestimatorconvergesinprobabilitytothetrueparametervector.ProofsofconsistencyofMestimatorsrelyonafundamentalconvergenceresultthat,itself,restsonassumptions(a)through(d)above.Wehaveassumedidentification.Thefundamentaldeviceisthefollowing:Becauseofitsdependenceonthedata,q(θ|data)isarandomvariable.Weassumedin(c)thatplimq(θ|data)=q0(θ)foranypointintheparameterspace.Assumption(c)statesthatthemaximumofq0(θ)occursatq0(θ0),soθ0isthemaximizeroftheprobabilitylimit.Byitsdefinition,theestimatorθˆ,isthemaximizerofq(θ|data).Therefore,consistencyrequiresthelimitofthemaximizer,θˆbeequaltothemaximizerofthelimit,θ0.Ouridentificationconditionestablishesthis.WewillusethisapproachinsomewhatgreaterdetailinSection17.4.5awhereweestablishconsistencyofthemaximumlikelihoodestimator.THEOREM16.2AsymptoticNormalityofMEstimatorsIf(i)θˆisaconsistentestimatorofθ0whereθ0isapointintheinterioroftheparameterspace;(ii)q(θ|data)isconcaveandtwicecontinuouslydifferentiableinθinaneigh-borhoodofθ0;√d(iii)n[∂q(θ0|data)/∂θ0]−→N[0,];(iv)foranyθin,limPr[|(∂2q(θ|data)/∂θ∂θ)−h(θ)|>ε]=0∀ε>0kmkmn→∞wherehkm(θ)isacontinuousfinitevaluedfunctionofθ;(v)thematrixofelementsH(θ)isnonsingularatθ0,then√dn(θˆ−θ)−→N0,[H−1(θ)H−1(θ)]000TheproofofasymptoticnormalityisbasedonthemeanvaluetheoremfromcalculusandaTaylorseriesexpansionofthederivativesofthemaximizedcriterionfunctionaroundthetrueparametervector;√∂q(θˆ|data)√∂q(θ0|data)∂2q(θ¯|data)√n=0=n+n(θˆ−θ0).∂θˆ∂θ0∂θ¯∂θ¯\nGreene-50240bookJune20,200218:2CHAPTER16✦EstimationFrameworksinEconometrics465Thesecondderivativeisevaluatedatapointθ¯thatisbetweenθˆandθ0,thatis,θ¯=wθˆ+(1−w)θ0forsome00andnotpurchaseitifyi≤0.Letusformthelikelihoodfunctionfortheobserveddata,whichare(purchaseornot)andincome.Therandomvariableinthismodelis“purchase”or“notpurchase”—thereareonlytwooutcomes.TheprobabilityofapurchaseisProb(purchase|β1,β2,σ,xi)=Prob(yi>0|β1,β2,σ,xi)=Prob(β1+β2xi+εi>0|β1,β2,σ,xi)=Prob[εi>−(β1+β2xi)|β1,β2,σ,xi]=Prob[εi/σ>−(β1+β2xi)/σ|β1,β2,σ,xi]=Prob[zi>−(β1+β2xi)/σ|β1,β2,σ,xi]wherezihasastandardnormaldistribution.Theprobabilityofnotpurchaseisjustoneminusthisprobability.Thelikelihoodfunctionis[Prob(purchase|β1,β2,σ,xi)][1−Prob(purchase|β1,β2,σ,xi)].i=purchasedi=notpurchasedWeneedgonofurthertoseethattheparametersofthismodelarenotidentified.Ifβ1,β2andσareallmultipliedbythesamenonzeroconstant,regardlessofwhatitis,thenProb(purchase)isunchanged,1−Prob(purchase)isalso,andthelikelihoodfunctiondoesnotchange.Thismodelrequiresanormalization.Theoneusuallyusedisσ=1,butsomeauthors[e.g.,Horowitz(1993)]haveusedβ1=1instead.17.3EFFICIENTESTIMATION:THEPRINCIPLEOFMAXIMUMLIKELIHOODTheprincipleofmaximumlikelihoodprovidesameansofchoosinganasymptoticallyefficientestimatorforaparameterorasetofparameters.Thelogicofthetechniqueiseasilyillustratedinthesettingofadiscretedistribution.Considerarandomsampleofthefollowing10observationsfromaPoissondistribution:5,0,1,1,0,3,2,3,4,and1.Thedensityforeachobservationise−θθyif(yi|θ)=.yi!\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation4710.13260.12240.11220.10200.09187L(x)0.081625100.0714))xx0.0612((LL0.0510lnlnL(x)0.0480.0360.0240.012000.50.81.11.41.72.02.32.62.93.23.5FIGURE17.1LikelihoodandLog-likelihoodFunctionsforaPoissonDistribution.Sincetheobservationsareindependent,theirjointdensity,whichisthelikelihoodforthissample,is10−10θ10y−10θ20eθi=1ieθf(y1,y2,...,y10|θ)=f(yi|θ)=10=.i=1i=1yi!207,360Thelastresultgivestheprobabilityofobservingthisparticularsample,assumingthataPoissondistributionwithasyetunknownparameterθgeneratedthedata.Whatvalueofθwouldmakethissamplemostprobable?Figure17.1plotsthisfunctionforvariousvaluesofθ.Ithasasinglemodeatθ=2,whichwouldbethemaximumlikelihoodestimate,orMLE,ofθ.ConsidermaximizingL(θ|y)withrespecttoθ.Sincethelogfunctionismonoton-icallyincreasingandeasiertoworkwith,weusuallymaximizelnL(θ|y)instead;insamplingfromaPoissonpopulation,nnlnL(θ|y)=−nθ+lnθyi−ln(yi!),i=1i=1∂lnL(θ|y)1n=−n+yi=0⇒θˆML=y¯n.∂θθi=1Fortheassumedsampleofobservations,lnL(θ|y)=−10θ+20lnθ−12.242,dlnL(θ|y)20=−10+=0⇒θˆ=2,dθθ\nGreene-50240bookJune26,200215:8472CHAPTER17✦MaximumLikelihoodEstimationandd2lnL(θ|y)−20=<0⇒thisisamaximum.dθ2θ2Thesolutionisthesameasbefore.Figure17.1alsoplotsthelogofL(θ|y)toillustratetheresult.Thereferencetotheprobabilityofobservingthegivensampleisnotexactinacontinuousdistribution,sinceaparticularsamplehasprobabilityzero.Nonetheless,theprincipleisthesame.ThevaluesoftheparametersthatmaximizeL(θ|data)oritslogarethemaximumlikelihoodestimates,denotedθˆ.Sincethelogarithmisamonotonicfunction,thevaluesthatmaximizeL(θ|data)arethesameasthosethatmaximizelnL(θ|data).ThenecessaryconditionformaximizinglnL(θ|data)is∂lnL(θ|data)=0.(17-4)∂θThisiscalledthelikelihoodequation.ThegeneralresultthenisthattheMLEisarootofthelikelihoodequation.Theapplicationtotheparametersofthedgpforadiscreterandomvariablearesuggestivethatmaximumlikelihoodisa“good”useofthedata.Itremainstoestablishthisasageneralprinciple.Weturntothatissueinthenextsection.Example17.2LogLikelihoodFunctionandLikelihoodEquationsfortheNormalDistributionInsamplingfromanormaldistributionwithmeanµandvarianceσ2,thelog-likelihoodfunc-tionandthelikelihoodequationsforµandσ2arennn1(y−µ)2lnL(µ,σ2)=−ln(2π)−lnσ2−i,(17-5)222σ2i=1n∂lnL1=(yi−µ)=0,(17-6)∂µσ2i=1n∂lnLn12=−+(yi−µ)=0.(17-7)∂σ22σ22σ4i=1Tosolvethelikelihoodequations,multiply(17-6)byσ2andsolveforµˆ,theninsertthissolutionin(17-7)andsolveforσ2.Thesolutionsarenn1122µˆML=yi=y¯nandσˆML=(yi−y¯n).(17-8)nni=1i=117.4PROPERTIESOFMAXIMUMLIKELIHOODESTIMATORSMaximumlikelihoodestimators(MLEs)aremostattractivebecauseoftheirlarge-sampleorasymptoticproperties.\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation473DEFINITION17.2AsymptoticEfficiencyAnestimatorisasymptoticallyefficientifitisconsistent,asymptoticallynormallydistributed(CAN),andhasanasymptoticcovariancematrixthatisnotlargerthantheasymptoticcovariancematrixofanyotherconsistent,asymptoticallynormallydistributedestimator.2Ifcertainregularityconditionsaremet,theMLEwillhavetheseproperties.Thefinitesamplepropertiesaresometimeslessthanoptimal.Forexample,theMLEmaybebi-ased;theMLEofσ2inExample17.2isbiaseddownward.TheoccasionalstatementthatthepropertiesoftheMLEareonlyoptimalinlargesamplesisnottrue,however.Itcanbeshownthatwhensamplingisfromanexponentialfamilyofdistributions(seeDefini-tion18.1),therewillexistsufficientstatistics.Ifso,MLEswillbefunctionsofthem,whichmeansthatwhenminimumvarianceunbiasedestimatorsexist,theywillbeMLEs.[SeeStuartandOrd(1989).]Mostapplicationsineconometricsdonotinvolveexponentialfamilies,sotheappealoftheMLEremainsprimarilyitsasymptoticproperties.Weusethefollowingnotation:θˆisthemaximumlikelihoodestimator;θ0de-notesthetruevalueoftheparametervector;θdenotesanotherpossiblevalueoftheparametervector,nottheMLEandnotnecessarilythetruevalues.ExpectationbasedonthetruevaluesoftheparametersisdenotedE0[.].Ifweassumethattheregularityconditionsdiscussedbelowaremetbyf(x,θ0),thenwehavethefollowingtheorem.THEOREM17.1PropertiesofanMLEUnderregularity,themaximumlikelihoodestimator(MLE)hasthefollowingasymptoticproperties:M1.Consistency:plimθˆ=θ0.a−1M2.Asymptoticnormality:θˆ∼N[θ0,{I(θ0)}],whereI(θ)=−E[∂2lnL/∂θ∂θ].0000M3.Asymptoticefficiency:θˆisasymptoticallyefficientandachievestheCramer–Raolowerbound´forconsistentestimators,giveninM2andTheoremC.2.M4.Invariance:Themaximumlikelihoodestimatorofγ0=c(θ0)isc(θˆ)ifc(θ0)isacontinuousandcontinuouslydifferentiablefunction.17.4.1REGULARITYCONDITIONSTosketchproofsoftheseresults,wefirstobtainsomeusefulpropertiesofprobabilitydensityfunctions.Weassumethat(y1,...,yn)isarandomsamplefromthepopulation2Notlargerisdefinedinthesenseof(A-118):Thecovariancematrixofthelessefficientestimatorequalsthatoftheefficientestimatorplusanonnegativedefinitematrix.\nGreene-50240bookJune26,200215:8474CHAPTER17✦MaximumLikelihoodEstimationwithdensityfunctionf(yi|θ0)andthatthefollowingregularityconditionshold.[Ourstatementoftheseisinformal.AmorerigoroustreatmentmaybefoundinStuartandOrd(1989)orDavidsonandMacKinnon(1993).]DEFINITION17.3RegularityConditionsR1.Thefirstthreederivativesoflnf(yi|θ)withrespecttoθarecontinuousandfiniteforalmostallyiandforallθ.ThisconditionensurestheexistenceofacertainTaylorseriesapproximationandthefinitevarianceofthederivativesoflnL.R2.Theconditionsnecessarytoobtaintheexpectationsofthefirstandsecondderivativesoflnf(yi|θ)aremet.R3.Forallvaluesofθ,|∂3lnf(y|θ)/∂θ∂θ∂θ|islessthanafunctionthatijklhasafiniteexpectation.ThisconditionwillallowustotruncatetheTaylorseries.Withtheseregularityconditions,wewillobtainthefollowingfundamentalchar-acteristicsoff(yi|θ):D1issimplyaconsequenceofthedefinitionofthelikelihoodfunction.D2leadstothemomentconditionwhichdefinesthemaximumlikelihoodestimator.Ontheonehand,theMLEisfoundasthemaximizerofafunction,whichmandatesfindingthevectorwhichequatesthegradienttozero.Ontheother,D2isamorefundamentalrelationshipwhichplacestheMLEintheclassofgeneralizedmethodofmomentsestimators.D3produceswhatisknownastheInformationmatrixequality.ThisrelationshipshowshowtoobtaintheasymptoticcovariancematrixoftheMLE.17.4.2PROPERTIESOFREGULARDENSITIESDensitiesthatare“regular”byDefinition17.3havethreepropertieswhichareusedinestablishingthepropertiesofmaximumlikelihoodestimators:THEOREM17.2MomentsoftheDerivativesoftheLog-LikelihoodD1.lnf(y|θ),g=∂lnf(y|θ)/∂θ,andH=∂2lnf(y|θ)/∂θ∂θ,iiiiii=1,...,n,areallrandomsamplesofrandomvariables.Thisstatementfollowsfromourassumptionofrandomsampling.Thenotationgi(θ0)andHi(θ0)indicatesthederivativeevaluatedatθ0.D2.E0[gi(θ0)]=0.D3.Var[gi(θ0)]=−E[Hi(θ0)].ConditionD1issimplyaconsequenceofthedefinitionofthedensity.Forthemoment,weallowtherangeofyitodependontheparameters;A(θ0)≤yi≤B(θ0).(Consider,forexample,findingthemaximumlikelihoodestimatorofθ/break\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation475foracontinuousuniformdistributionwithrange[0,θ0].)(Inthefollowing,thesingleintegral...dyi,wouldbeusedtoindicatethemultipleintegrationoveralltheelementsofamultivariateofyiifthatwerenecessary).Bydefinition,B(θ0)f(y−i|θ0)dyi=1.A(θ0)Now,differentiatethisexpressionwithrespecttoθ0.Leibnitz’stheoremgivesB(θ0)B(θ)∂f(yi|θ0)dyi0∂f(yi|θ0)∂B(θ0)A(θ0)=dyi+f(B(θ0)|θ0)∂θ0A(θ0)∂θ0∂θ0∂A(θ0)−f(A(θ0)|θ0)∂θ0=0.Ifthesecondandthirdtermsgotozero,thenwemayinterchangetheoperationsofdifferentiationandintegration.Thenecessaryconditionisthatlimyi→A(θ0)f(yi|θ0)=limyi→B(θ0)f(yi|θ0)=0.(Notethattheuniformdistributionsuggestedaboveviolatesthiscondition.)Sufficientconditionsarethattherangeoftheobservedrandomvariable,yi,doesnotdependontheparameters,whichmeansthat∂A(θ0)/∂θ0=∂B(θ0)/∂θ0=0orthatthedensityiszeroattheterminalpoints.Thiscondition,then,isregularityconditionR2.Thelatterisusuallyassumed,andwewillassumeitinwhatfollows.So,∂f(yi|θ0)dyi∂f(yi|θ0)∂lnf(yi|θ0)∂lnf(yi|θ0)=dyi=f(yi|θ0)dyi=E0=0.∂θ0∂θ0∂θ0∂θ0ThisprovesD2.Sincewemayinterchangetheoperationsofintegrationanddifferentiation,wedifferentiateundertheintegralonceagaintoobtain∂2lnf(y|θ)∂lnf(y|θ)∂f(y|θ)i0i0i0f(yi|θ0)+dyi=0.∂θ0∂θ0∂θ0∂θ0But∂f(yi|θ0)∂lnf(yi|θ0)=f(yi|θ0),∂θ0∂θ0andtheintegralofasumisthesumofintegrals.Therefore,∂2lnf(y|θ)∂lnf(y|θ)∂lnf(y|θ)i0i0i0−f(yi|θ0)dyi=f(yi|θ0)dyi=[0].∂θ0∂θ0∂θ0∂θ0Theleft-handsideoftheequationisthenegativeoftheexpectedsecondderivativesmatrix.Theright-handsideistheexpectedsquare(outerproduct)ofthefirstderivativevector.But,sincethisvectorhasexpectedvalue0(wejustshowedthis),theright-handsideisthevarianceofthefirstderivativevector,whichprovesD3:∂lnf(y|θ)∂lnf(y|θ)∂lnf(y|θ)∂2lnf(y|θ)i0i0i0i0Var0=E0=−E.∂θ0∂θ0∂θ0∂θ0∂θ0\nGreene-50240bookJune26,200215:8476CHAPTER17✦MaximumLikelihoodEstimation17.4.3THELIKELIHOODEQUATIONThelog-likelihoodfunctionisnlnL(θ|y)=lnf(yi|θ).i=1Thefirstderivativevector,orscorevector,is∂lnL(θ|y)n∂lnf(y|θ)nig===gi.(17-9)∂θ∂θi=1i=1Sincewearejustaddingterms,itfollowsfromD1andD2thatatθ0,∂lnL(θ0|y)E0=E0[g0]=0.(17-10)∂θ0whichisthelikelihoodequationmentionedearlier.17.4.4THEINFORMATIONMATRIXEQUALITYTheHessianofthelog-likelihoodis∂2lnL(θ|y)n∂2lnf(y|θ)NiH===Hi.∂θ∂θ∂θ∂θi=1i=1Evaluatingonceagainatθ0,bytakingnnE[gg]=Egg00000i0ji=1j=1and,becauseofD1,droppingtermswithunequalsubscriptsweobtainnnE[gg]=Egg=E(−H)=−E[H],00000i0i00i00i=1i=1sothat∂lnL(θ0|y)∂lnL(θ0|y)∂lnL(θ0|y)Var0=E0∂θ0∂θ0∂θ0(17-11)∂2lnL(θ|y)0=−E0.∂θ0∂θ0Thisveryusefulresultisknownastheinformationmatrixequality.17.4.5ASYMPTOTICPROPERTIESOFTHEMAXIMUMLIKELIHOODESTIMATORWecannowsketchaderivationoftheasymptoticpropertiesoftheMLE.Formalproofsoftheseresultsrequiresomefairlyintricatemathematics.TwowidelycitedderivationsarethoseofCramer(1948)andAmemiya(1985).Tosuggestthe´flavoroftheexercise,\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation477wewillsketchananalysisprovidedbyStuartandOrd(1989)forasimplecase,andindicatewhereitwillbenecessarytoextendthederivationifitweretobefullygeneral.17.4.5.aCONSISTENCYWeassumethatf(yi|θ0)isapossiblymultivariatedensitywhichatthispointdoesnotdependoncovariates,xi.Thus,thisistheiid,randomsamplingcase.SinceθˆistheMLE,inanyfinitesample,foranyθ=θˆ(includingthetrueθ0)itmustbetruethatlnL(θˆ)≥lnL(θ).(17-12)Consider,then,therandomvariableL(θ)/L(θ0).Sincethelogfunctionisstrictlycon-cave,fromJensen’sInequality(TheoremD.8.),wehaveL(θ)L(θ)E0logE0[(1/n)lnL(θ)]foranyθ=θ0(includingθˆ).Thisresultis(17-15).Inwords,theexpectedvalueofthelog-likelihoodismaximizedatthetruevalueoftheparameters.Foranyθ,includingθˆ,n[(1/n)lnL(θ)]=(1/n)lnf(yi|θ)i=1isthesamplemeanofniidrandomvariables,withexpectationE0[(1/n)lnL(θ)].Sincethesamplingisiidbytheregularityconditions,wecaninvoketheKhinchineThe-orem,D.5;thesamplemeanconvergesinprobabilitytothepopulationmean.Us-ingθ=θˆ,itfollowsfromTheorem17.3thatasn→∞,limProb{[(1/n)lnL(θˆ)]<[(1/n)lnL(θ0)]}=1ifθˆ=θ0.But,θˆistheMLE,soforeveryn,(1/n)lnL(θˆ)≥(1/n)lnL(θ0).Theonlywaythesecanbothbetrueisif(1/n)timesthesamplelog-likelihoodevaluatedattheMLEconvergestothepopulationexpectationof(1/n)timesthelog-likelihoodevaluatedatthetrueparameters.Thereremainsonefinalstep.\nGreene-50240bookJune26,200215:8478CHAPTER17✦MaximumLikelihoodEstimationDoes(1/n)lnL(θˆ)→(1/n)lnL(θ0)implythatθˆ→θ0?Ifthereisasingleparameterandthelikelihoodfunctionisonetoone,thenclearlyso.Formoregeneralcases,thisrequiresafurthercharacterizationofthelikelihoodfunction.Ifthelikelihoodisstrictlycontinuousandtwicedifferentiable,whichweassumedintheregularityconditions,andiftheparametersofthemodelareidentifiedwhichweassumedatthebeginningofthisdiscussion,thenyes,itdoes,sowehavetheresult.Thisisaheuristicproof.Asnoted,formalpresentationsappearinmoreadvancedtreatisesthanthisone.Weshouldalsonote,wehaveassumedatseveralpointsthatsamplemeansconvergedtothepopulationexpectations.Thisislikelytobetrueforthesortsofapplicationsusuallyencounteredineconometrics,butafullygeneralsetofresultswouldlookmorecloselyatthiscondition.Second,wehaveassumediidsamplinginthepreceding—thatis,thedensityforyidoesnotdependonanyothervariables,xi.Thiswillalmostneverbetrueinpractice.Assumptionsaboutthebehaviorofthesevariableswillentertheproofsaswell.Forexample,inassessingthelargesamplebehavioroftheleastsquaresestimator,wehaveinvokedanassumptionthatthedataare“wellbehaved.”Thesamesortofconsiderationwillapplyhereaswell.Wewillreturntothisissueshortly.Withallthisinplace,wehavepropertyM1,plimθˆ=θ0.17.4.5.bASYMPTOTICNORMALITYAtthemaximumlikelihoodestimator,thegradientofthelog-likelihoodequalszero(bydefinition),sog(θˆ)=0.(Thisisthesamplestatistic,nottheexpectation.)Expandthissetofequationsinasecond-orderTaylorseriesaroundthetrueparametersθ0.WewillusethemeanvaluetheoremtotruncatetheTaylorseriesatthesecondterm.g(θˆ)=g(θ0)+H(θ¯)(θˆ−θ0)=0.TheHessianisevaluatedatapointθ¯thatisbetweenθˆandθ0(θ¯=wθˆ+(1−w)√θ0forsome00intheregressionmodel,forexample.•Identifiability.Estimationmustbefeasible.Thisisthesubjectofdefinition17.1concerningidentificationandthesurroundingdiscussion.•Wellbehaveddata.Lawsoflargenumbersapplytosamplemeansinvolvingthedataandsomeformofcentrallimittheorem(generallyLyapounov)canbeappliedtothegradient.Ergodicstationarityisbroadenoughtoencompassanysituationthatislikelytoariseinpractice,thoughitisprobablymoregeneralthanweneedformostapplications,sincewewillnotencounterdependentobservationsspecificallyuntillaterinthebook.ThedefinitionsinChapter5areassumedtoholdgenerally.Withtheseinplace,analysisisessentiallythesameincharacterasthatweusedinthelinearregressionmodelinChapter5andfollowspreciselyalongthelinesofSection16.5.\nGreene-50240bookJune26,200215:8484CHAPTER17✦MaximumLikelihoodEstimation17.5THREEASYMPTOTICALLYEQUIVALENTTESTPROCEDURESThenextseveralsectionswilldiscussthemostcommonlyusedtestprocedures:thelikelihoodratio,Wald,andLagrangemultipliertests.[ExtensivediscussionoftheseproceduresisgiveninGodfrey(1988).]WeconsidermaximumlikelihoodestimationofaparameterθandatestofthehypothesisH0:c(θ)=0.ThelogicofthetestscanbeseeninFigure17.2.5Thefigureplotsthelog-likelihoodfunctionlnL(θ),itsderivativewithrespecttoθ,dlnL(θ)/dθ,andtheconstraintc(θ).Therearethreeapproachestotestingthehypothesissuggestedinthefigure:•Likelihoodratiotest.Iftherestrictionc(θ)=0isvalid,thenimposingitshouldnotleadtoalargereductioninthelog-likelihoodfunction.Therefore,webasethetestonthedifference,lnLU−lnLR,whereLUisthevalueofthelikelihoodfunctionattheunconstrainedvalueofθandLRisthevalueofthelikelihoodfunctionattherestrictedestimate.•Waldtest.Iftherestrictionisvalid,thenc(θˆMLE)shouldbeclosetozerosincetheMLEisconsistent.Therefore,thetestisbasedonc(θˆMLE).Werejectthehypothesisifthisvalueissignificantlydifferentfromzero.•Lagrangemultipliertest.Iftherestrictionisvalid,thentherestrictedestimatorshouldbenearthepointthatmaximizesthelog-likelihood.Therefore,theslopeofthelog-likelihoodfunctionshouldbenearzeroattherestrictedestimator.Thetestisbasedontheslopeofthelog-likelihoodatthepointwherethefunctionismaximizedsubjecttotherestriction.Thesethreetestsareasymptoticallyequivalentunderthenullhypothesis,buttheycanbehaveratherdifferentlyinasmallsample.Unfortunately,theirsmall-sampleproper-tiesareunknown,exceptinafewspecialcases.Asaconsequence,thechoiceamongthemistypicallymadeonthebasisofeaseofcomputation.Thelikelihoodratiotestrequirescalculationofbothrestrictedandunrestrictedestimators.Ifbotharesimpletocompute,thenthiswaytoproceedisconvenient.TheWaldtestrequiresonlytheunrestrictedestimator,andtheLagrangemultipliertestrequiresonlytherestrictedestimator.Insomeproblems,oneoftheseestimatorsmaybemucheasiertocomputethantheother.Forexample,alinearmodelissimpletoestimatebutbecomesnonlinearandcumbersomeifanonlinearconstraintisimposed.Inthiscase,theWaldstatisticmightbepreferable.Alternatively,restrictionssometimesamounttotheremovalofnonlinearities,whichwouldmaketheLagrangemultipliertestthesimplerprocedure.17.5.1THELIKELIHOODRATIOTESTLetθbeavectorofparameterstobeestimated,andletH0specifysomesortofrestrictionontheseparameters.LetθˆUbethemaximumlikelihoodestimatorofθobtainedwithoutregardtotheconstraints,andletθˆRbetheconstrainedmaximumlikelihoodestimator.IfLˆUandLˆRarethelikelihoodfunctionsevaluatedatthesetwoestimates,thenthe5SeeBuse(1982).Notethatthescaleoftheverticalaxiswouldbedifferentforeachcurve.Assuch,thepointsofintersectionhavenosignificance.\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation485lnL()dlnL()dc()dlnL()dlnLLikelihoodratiolnLRlnL()c()LagrangemultiplierWald0^^RMLEFIGURE17.2ThreeBasesforHypothesisTests.likelihoodratioisLˆRλ=.(17-21)LˆUThisfunctionmustbebetweenzeroandone.Bothlikelihoodsarepositive,andLˆRcannotbelargerthanLˆU.(Arestrictedoptimumisneversuperiortoanunrestrictedone.)Ifλistoosmall,thendoubtiscastontherestrictions.Anexamplefromadiscretedistributionhelpstofixtheseideas.Inestimatingfromasampleof10fromaPoissondistributionatthebeginningofSection17.3,wefoundthe\nGreene-50240bookJune26,200215:8486CHAPTER17✦MaximumLikelihoodEstimationMLEoftheparameterθtobe2.Atthisvalue,thelikelihood,whichistheprobabilityofobservingthesamplewedid,is0.104×10−8.ArethesedataconsistentwithH:θ=1.8?0L=0.936×10−9,whichis,asexpected,smaller.ThisparticularsampleissomewhatRlessprobableunderthehypothesis.Theformaltestprocedureisbasedonthefollowingresult.THEOREM17.5LimitingDistributionoftheLikelihoodRatioTestStatisticUnderregularityandunderH0,thelargesampledistributionof−2lnλischi-squared,withdegreesoffreedomequaltothenumberofrestrictionsimposed.Thenullhypothesisisrejectedifthisvalueexceedstheappropriatecriticalvaluefromthechi-squaredtables.Thus,forthePoissonexample,0.0936−2lnλ=−2ln=0.21072.0.104Thischi-squaredstatisticwithonedegreeoffreedomisnotsignificantatanyconven-tionallevel,sowewouldnotrejectthehypothesisthatθ=1.8onthebasisofthistest.6Itistemptingtousethelikelihoodratiotesttotestasimplenullhypothesisagainstasimplealternative.Forexample,wemightbeinterestedinthePoissonsettingintestingH0:θ=1.8againstH1:θ=2.2.Butthetestcannotbeusedinthisfashion.Thedegreesoffreedomofthechi-squaredstatisticforthelikelihoodratiotestequalsthereductioninthenumberofdimensionsintheparameterspacethatresultsfromimposingtherestrictions.Intestingasimplenullhypothesisagainstasimplealternative,thisvalueiszero.7Second,onesometimesencountersanattempttotestonedistributionalassumptionagainstanotherwithalikelihoodratiotest;forexample,acertainmodelwillbeestimatedassuminganormaldistributionandthenassumingatdistribution.Theratioofthetwolikelihoodsisthencomparedtodeterminewhichdistributionispreferred.Thiscomparisonisalsoinappropriate.Theparameterspaces,andhencethelikelihoodfunctionsofthetwocases,areunrelated.17.5.2THEWALDTESTApracticalshortcomingofthelikelihoodratiotestisthatitusuallyrequiresestimationofboththerestrictedandunrestrictedparametervectors.Incomplexmodels,oneortheotheroftheseestimatesmaybeverydifficulttocompute.Fortunately,therearetwoalternativetestingprocedures,theWaldtestandtheLagrangemultipliertest,thatcircumventthisproblem.Bothtestsarebasedonanestimatorthatisasymptoticallynormallydistributed.6Ofcourse,ouruseofthelarge-sampleresultinasampleof10mightbequestionable.7Notethatbecausebothlikelihoodsarerestrictedinthisinstance,thereisnothingtoprevent−2lnλfrombeingnegative.\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation487Thesetwotestsarebasedonthedistributionofthefullrankquadraticformcon-sideredinSectionB.11.6.Specifically,Ifx∼N[µ,],then(x−µ)−1(x−µ)∼chi-squared[J].(17-22)JInthesettingofahypothesistest,underthehypothesisthatE(x)=µ,thequadraticformhasthechi-squareddistribution.IfthehypothesisthatE(x)=µisfalse,however,thenthequadraticformjustgivenwill,onaverage,belargerthanitwouldbeifthehypothesisweretrue.8Thisconditionformsthebasisfortheteststatisticsdiscussedinthisandthenextsection.Letθˆbethevectorofparameterestimatesobtainedwithoutrestrictions.Wehypo-thesizeasetofrestrictionsH0:c(θ)=q.Iftherestrictionsarevalid,thenatleastapproximatelyθˆshouldsatisfythem.Ifthehypothesisiserroneous,however,thenc(θˆ)−qshouldbefartherfrom0thanwouldbeexplainedbysamplingvariabilityalone.ThedeviceweusetoformalizethisideaistheWaldtest.THEOREM17.6LimitingDistributionoftheWaldTestStatisticTheWaldstatisticis−1W=[c(θˆ)−q]Asy.Var[c(θˆ)−q][c(θˆ)−q].UnderH0,inlargesamples,Whasachi-squareddistributionwithdegreesoffreedomequaltothenumberofrestrictions[i.e.,thenumberofequationsinc(θˆ)−q=0].AderivationofthelimitingdistributionoftheWaldstatisticappearsinTheorem6.15.Thistestisanalogoustothechi-squaredstatisticin(17-22)ifc(θˆ)−qisnormallydistributedwiththehypothesizedmeanof0.AlargevalueofWleadstorejectionofthehypothesis.Note,finally,thatWonlyrequirescomputationoftheunrestrictedmodel.Onemuststillcomputethecovariancematrixappearingintheprecedingquadraticform.Thisresultisthevarianceofapossiblynonlinearfunction,whichwetreatedearlier.Est.Asy.Var[c(θˆ)−q]=CˆEst.Asy.Var[θˆ]Cˆ,∂c(θˆ)(17-23)Cˆ=.∂θˆThatis,CistheJ×KmatrixwhosejthrowisthederivativesofthejthconstraintwithrespecttotheKelementsofθ.Acommonapplicationoccursintestingasetoflinearrestrictions.8Ifthemeanisnotµ,thenthestatisticin(17-22)willhaveanoncentralchi-squareddistribution.Thisdistributionhasthesamebasicshapeasthecentralchi-squareddistribution,withthesamedegreesoffreedom,butliestotherightofit.Thus,arandomdrawfromthenoncentraldistributionwilltend,onaverage,tobelargerthanarandomobservationfromthecentraldistribution.\nGreene-50240bookJune26,200215:8488CHAPTER17✦MaximumLikelihoodEstimationFortestingasetoflinearrestrictionsRθ=q,theWaldtestwouldbebasedonH0:c(θ)−q=Rθ−q=0,∂c(θˆ)Cˆ==R,(17-24)∂θˆEst.Asy.Var[c(θˆ)−q]=REst.Asy.Var[θˆ]R,andW=[Rθˆ−q][REst.Asy.Var(θˆ)R]−1[Rθˆ−q].ThedegreesoffreedomisthenumberofrowsinR.Ifc(θ)−qisasinglerestriction,thentheWaldtestwillbethesameasthetestbasedontheconfidenceintervaldevelopedpreviously.IfthetestisH0:θ=θ0versusH1:θ=θ0,thentheearliertestisbasedon|θˆ−θ0|z=,(17-25)s(θ)ˆwheres(θ)ˆistheestimatedasymptoticstandarderror.Theteststatisticiscomparedtotheappropriatevaluefromthestandardnormaltable.TheWaldtestwillbebasedon(θˆ−θ)2−102W=[(θˆ−θ0)−0]Asy.Var[(θˆ−θ0)−0][(θˆ−θ0)−0]==z.(17-26)Asy.Var[θˆ]HereWhasachi-squareddistributionwithonedegreeoffreedom,whichisthedistri-butionofthesquareofthestandardnormalteststatisticin(17-25).Tosummarize,theWaldtestisbasedonmeasuringtheextenttowhichtheun-restrictedestimatesfailtosatisfythehypothesizedrestrictions.Therearetwoshort-comingsoftheWaldtest.First,itisapuresignificancetestagainstthenullhypothesis,notnecessarilyforaspecificalternativehypothesis.Assuch,itspowermaybelimitedinsomesettings.Infact,theteststatistictendstoberatherlargeinapplications.Thesecondshortcomingisnotsharedbyeitheroftheotherteststatisticsdiscussedhere.TheWaldstatisticisnotinvarianttotheformulationoftherestrictions.Forexample,foratestofthehypothesisthatafunctionθ=β/(1−γ)equalsaspecificvalueqtherearetwoapproachesonemightchoose.AWaldtestbaseddirectlyonθ−q=0woulduseastatisticbasedonthevarianceofthisnonlinearfunction.Analternativeapproachwouldbetoanalyzethelinearrestrictionβ−q(1−γ)=0,whichisanequivalent,butlinear,restriction.TheWaldstatisticsforthesetwotestscouldbedifferentandmightleadtodifferentinferences.ThesetwoshortcomingshavebeenwidelyviewedascompellingargumentsagainstuseoftheWaldtest.But,initsfavor,theWaldtestdoesnotrelyonastrongdistributionalassumption,asdothelikelihoodratioandLagrangemultipliertests.Therecenteconometricsliteratureisrepletewithapplicationsthatarebasedondistributionfreeestimationprocedures,suchastheGMMmethod.Assuch,inrecentyears,theWaldtesthasenjoyedaredemptionofsorts.\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation48917.5.3THELAGRANGEMULTIPLIERTESTThethirdtestprocedureistheLagrangemultiplier(LM)orefficientscore(orjustscore)test.Itisbasedontherestrictedmodelinsteadoftheunrestrictedmodel.Supposethatwemaximizethelog-likelihoodsubjecttothesetofconstraintsc(θ)−q=0.LetλbeavectorofLagrangemultipliersanddefinetheLagrangeanfunctionlnL∗(θ)=lnL(θ)+λ(c(θ)−q).Thesolutiontotheconstrainedmaximizationproblemistherootof∂lnL∗∂lnL(θ)=+Cλ=0,∂θ∂θ(17-27)∂lnL∗=c(θ)−q=0,∂λwhereCisthetransposeofthederivativesmatrixinthesecondlineof(17-23).Iftherestrictionsarevalid,thenimposingthemwillnotleadtoasignificantdifferenceinthemaximizedvalueofthelikelihoodfunction.Inthefirst-orderconditions,themeaningisthatthesecondterminthederivativevectorwillbesmall.Inparticular,λwillbesmall.Wecouldtestthisdirectly,thatis,testH0:λ=0,whichleadstotheLagrangemultipliertest.Thereisanequivalentsimplerformulation,however.Attherestrictedmaximum,thederivativesofthelog-likelihoodfunctionare∂lnL(θˆR)λˆ=gˆ=−CˆR.(17-28)∂θˆRIftherestrictionsarevalid,atleastwithintherangeofsamplingvariability,thengˆR=0.Thatis,thederivativesofthelog-likelihoodevaluatedattherestrictedparametervectorwillbeapproximatelyzero.Thevectoroffirstderivativesofthelog-likelihoodisthevectorofefficientscores.Sincethetestisbasedonthisvector,itiscalledthescoretestaswellastheLagrangemultipliertest.Thevarianceofthefirstderivativevectoristheinformationmatrix,whichwehaveusedtocomputetheasymptoticcovariancematrixoftheMLE.TheteststatisticisbasedonreasoninganalogoustothatunderlyingtheWaldteststatistic.THEOREM17.7LimitingDistributionoftheLagrangeMultiplierStatisticTheLagrangemultiplierteststatisticis∂lnL(θˆR)−1∂lnL(θˆR)LM=[I(θˆR)].∂θˆR∂θˆRUnderthenullhypothesis,LMhasalimitingchi-squareddistributionwithdegreesoffreedomequaltothenumberofrestrictions.Alltermsarecomputedattherestrictedestimator.\nGreene-50240bookJune26,200215:8490CHAPTER17✦MaximumLikelihoodEstimationTheLMstatistichasausefulform.LetgˆiRdenotetheithterminthegradientofthelog-likelihoodfunction.Then,ngˆ=gˆ=Gˆi,RiRRi=1whereGˆisthen×Kmatrixwithithrowequaltogandiisacolumnof1s.IfweuseRiRtheBHHH(outerproductofgradients)estimatorin(17-18)toestimatetheHessian,then[Iˆ(θˆ)]−1=[GˆGˆ]−1RRandLM=iGˆ[GˆGˆ]−1Gˆi.RRRRNow,sinceiiequalsn,LM=n(iGˆ[GˆGˆ]−1Gˆi/n)=nR2,whichisntimestheRRRRiuncenteredsquaredmultiplecorrelationcoefficientinalinearregressionofacolumnof1sonthederivativesofthelog-likelihoodfunctioncomputedattherestrictedestimator.Wewillencounterthisresultinvariousformsatseveralpointsinthebook.17.5.4ANAPPLICATIONOFTHELIKELIHOODBASEDTESTPROCEDURESConsider,again,thedatainExampleC.1.InExample17.4,theparameterβinthemodel1f(y|x,β)=e−yi/(β+xi)(17-29)iiβ+xiwasestimatedbymaximumlikelihood.Forconvenience,letβi=1/(β+xi).Thisexpo-nentialdensityisarestrictedformofamoregeneralgammadistribution,ρf(y|x,β,ρ)=βiyρ−1e−yiβi.(17-30)iii(ρ)Therestrictionisρ=1.9WeconsidertestingthehypothesisH0:ρ=1versusH1:ρ=1usingthevariousproceduresdescribedpreviously.Thelog-likelihoodanditsderivativesarennnlnL(β,ρ)=ρlnβi−nln(ρ)+(ρ−1)lnyi−yiβi,i=1i=1i=1∂lnLnn∂lnLnn=−ρβ+yβ2,=lnβ−n(ρ)+lny,(17-31)iiiii∂β∂ρi=1i=1i=1i=1∂2lnLnn∂2lnL∂2lnLn=ρβ2−2yβ3,=−n(ρ),=−β.∂β2iii∂ρ2∂β∂ρii=1i=1i=19Thegammafunction(ρ)andthegammadistributionaredescribedinSectionsB.4.5andE.5.3.\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation491TABLE17.1MaximumLikelihoodEstimatesQuantityUnrestrictedEstimateaRestrictedEstimateβ−4.7198(2.344)15.6052(6.794)ρ3.1517(0.7943)1.0000(0.000)lnL−82.91444−88.43771∂lnL/∂β0.00000.0000∂lnL/∂ρ0.00007.9162∂2lnL/∂β2−0.85628−0.021659∂2lnL/∂ρ2−7.4569−32.8987∂2lnL/∂β∂ρ−2.2423−0.66885aEstimatedasymptoticstandarderrorsbasedonVaregiveninparentheses.[Recallthat(ρ)=dln(ρ)/dρand(ρ)=d2ln(ρ)/dρ2.]Unrestrictedmaximumlikelihoodestimatesofβandρareobtainedbyequatingthetwofirstderivativestozero.Therestrictedmaximumlikelihoodestimateofβisobtainedbyequating∂lnL/∂βtozerowhilefixingρatone.TheresultsareshowninTable17.1.Threeestimatorsareavailablefortheasymptoticcovariancematrixoftheestimatorsofθ=(β,ρ).UsingtheactualHessianasin(17-17),wecomputeV=[−∂2lnL/∂θ∂θ]−1atthemaxi-imumlikelihoodestimates.Forthismodel,itiseasytoshowthatE[yi|xi]=ρ(β+xi)(eitherbydirectintegrationor,moresimply,byusingtheresultthatE[∂lnL/∂β]=0todeduceit).Therefore,wecanalsousetheexpectedHessianasin(17-16)tocom-puteV={−E[∂2lnL/∂θ∂θ]}−1.Finally,byusingthesumsofsquaresandcrossEiproductsofthefirstderivatives,weobtaintheBHHHestimatorin(17-18),VB=[(∂lnL/∂θ)(∂lnL/∂θ)]−1.ResultsinTable17.1arebasedonV.iThethreeestimatorsoftheasymptoticcovariancematrixproducenotablydifferentresults:5.495−1.6524.897−1.47313.35−4.314V=,VE=,VB=.−1.6520.6309−1.4730.5770−4.3141.535Giventhesmallsamplesize,thedifferencesaretobeexpected.Nonetheless,thestrikingdifferenceoftheBHHHestimatoristypicalofitserraticperformanceinsmallsamples.•ConfidenceIntervalTest:A95percentconfidenceintervalforρbasedonthe√unrestrictedestimatesis3.1517±1.960.6309=[1.5942,4.7085].Thisintervaldoesnotcontainρ=1,sothehypothesisisrejected.•LikelihoodRatioTest:TheLRstatisticisλ=−2[−88.43771−(−82.91444)]=11.0465.Thetablevalueforthetest,withonedegreeoffreedom,is3.842.Sincethecomputedvalueislargerthanthiscriticalvalue,thehypothesisisagainrejected.•WaldTest:TheWaldtestisbasedontheunrestrictedestimates.Forthisrestric-tion,c(θ)−q=ρ−1,dc(ρ)/ˆdρˆ=1,Est.Asy.Var[c(ρ)ˆ−q]=Est.Asy.Var[ρˆ]=0.6309,soW=(3.1517−1)2/[0.6309]=7.3384.Thecriticalvalueisthesameasthepreviousone.Hence,H0isonceagainrejected.NotethattheWaldstatisticisthesquareofthecorrespondingteststatisticthatwould√beusedintheconfidenceintervaltest,|3.1517−1|/0.6309=2.70895.\nGreene-50240bookJune26,200215:8492CHAPTER17✦MaximumLikelihoodEstimation•LagrangeMultiplierTest:TheLagrangemultipliertestisbasedontherestrictedestimators.Theestimatedasymptoticcovariancematrixofthederivativesusedtocomputethestatisticcanbeanyofthethreeestimatorsdiscussedearlier.TheBHHHestimator,VB,istheempiricalestimatorofthevarianceofthegradientandistheoneusuallyusedinpractice.Thiscomputationproduces−10.00994380.267620.0000LM=[0.00007.9162]=15.687.0.2676211.1977.9162Theconclusionisthesameasbefore.NotethatthesamecomputationdoneusingVratherthanVBproducesavalueof5.1182.Asbefore,weobservesubstantialsmallsamplevariationproducedbythedifferentestimators.Thelatterthreeteststatisticshavesubstantiallydifferentvalues.Itispossibletoreachdifferentconclusions,dependingonwhichoneisused.Forexample,ifthetesthadbeencarriedoutatthe1percentlevelofsignificanceinsteadof5percentandLMhadbeencomputedusingV,thenthecriticalvaluefromthechi-squaredstatisticwouldhavebeen6.635andthehypothesiswouldnothavebeenrejectedbytheLMtest.Asymptotically,allthreetestsareequivalent.But,inafinitesamplesuchasthisone,differencesaretobeexpected.10Unfortunately,thereisnoclearruleforhowtoproceedinsuchacase,whichhighlightstheproblemofrelyingonaparticularsignificancelevelanddrawingafirmrejectoracceptconclusionbasedonsampleevidence.17.6APPLICATIONSOFMAXIMUMLIKELIHOODESTIMATIONWenowexaminethreeapplicationsofthemaximumlikelihoodestimator.ThefirstextendstheresultsofChapters2through5tothelinearregressionmodelwithnormallydistributeddisturbances.Inthesecondapplication,wefitanonlinearregressionmodelbymaximumlikelihood.Thisapplicationillustratestheeffectoftransformationofthedependentvariable.Thethirdapplicationisarelativelystraightforwarduseofthemaximumlikelihoodtechniqueinanonlinearmodelthatdoesnotinvolvethenormaldistribution.ThisapplicationillustratesthesortsofextensionsoftheMLEintosettingsthatdepartfromthelinearmodeloftheprecedingchaptersandthataretypicalineconometricanalysis.17.6.1THENORMALLINEARREGRESSIONMODELThelinearregressionmodelisy=xβ+ε.iiiThelikelihoodfunctionforasampleofnindependent,identicallyandnormallydis-tributeddisturbancesis2−n/2−εε/(2σ2)L=(2πσ)e.(17-32)10Forfurtherdiscussionofthisproblem,seeBerndtandSavin(1977).\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation493Thetransformationfromεtoyisε=y−xβ,sotheJacobianforeachobservation,iiiii|∂ε/∂y|,isone.11Makingthetransformation,wefindthatthelikelihoodfunctionforiithenobservationsontheobservedrandomvariableis2−n/2(−1/(2σ2))(y−Xβ)(y−Xβ)L=(2πσ)e.(17-33)Tomaximizethisfunctionwithrespecttoβ,itwillbenecessarytomaximizetheexpo-nentorminimizethefamiliarsumofsquares.Takinglogs,weobtainthelog-likelihoodfunctionfortheclassicalregressionmodel:nn(y−Xβ)(y−Xβ)lnL=−ln2π−lnσ2−.(17-34)222σ2Thenecessaryconditionsformaximizingthislog-likelihoodare∂lnLX(y−Xβ)∂βσ20==.(17-35)∂lnL−n(y−Xβ)(y−Xβ)0+∂σ22σ22σ4Thevaluesthatsatisfytheseequationsareeeβˆ=(XX)−1Xy=bandσˆ2=.(17-36)MLMLnTheslopeestimatoristhefamiliarone,whereasthevarianceestimatordiffersfromtheleastsquaresvaluebythedivisorofninsteadofn−K.12TheCramer´–Raoboundforthevarianceofanunbiasedestimatoristhenegativeinverseoftheexpectationof22∂lnL∂lnLXXXε∂β∂β∂β∂σ2−σ2−σ4=.(17-37)∂2lnL∂2lnLεXnεε−−∂σ2∂β∂(σ2)2σ42σ4σ6Intakingexpectedvalues,theoff-diagonaltermvanishesleavingσ2(XX)−10[I(β,σ2)]−1=.(17-38)02σ4/nTheleastsquaresslopeestimatoristhemaximumlikelihoodestimatorforthismodel.Therefore,itinheritsallthedesirableasymptoticpropertiesofmaximumlikelihoodestimators.Weshowedearlierthats2=ee/(n−K)isanunbiasedestimatorofσ2.Therefore,themaximumlikelihoodestimatorisbiasedtowardzero:n−KKEσˆ2=σ2=1−σ2<σ2.(17-39)MLnn11See(B-41)inSectionB.5.TheanalysistofollowisconditionedonX.Toavoidclutteringthenotation,wewillleavethisaspectofthemodelimplicitintheresults.Asnotedearlier,weassumethatthedatageneratingprocessforXdoesnotinvolveβorσ2andthatthedataarewellbehavedasdiscussedinChapter5.12Asageneralrule,maximumlikelihoodestimatorsdonotmakecorrectionsfordegreesoffreedom.\nGreene-50240bookJune26,200215:8494CHAPTER17✦MaximumLikelihoodEstimationDespiteitssmall-samplebias,themaximumlikelihoodestimatorofσ2hasthesamedesirableasymptoticproperties.Weseein(17-39)thats2andσˆ2differonlybyafactor−K/n,whichvanishesinlargesamples.Itisinstructivetoformalizetheasymptoticequivalenceofthetwo.From(17-38),weknowthat√dnσˆ2−σ2−→N[0,2σ4].MLItfollowsK√KKKz=1−nσˆ2−σ2+√σ2−→d1−N[0,2σ4]+√σ2.nMLnnnn√ButK/nandK/nvanishasn→∞,sothelimitingdistributionofzisalsoN[0,2σ4].√nSincez=n(s2−σ2),wehaveshownthattheasymptoticdistributionofs2isthensameasthatofthemaximumlikelihoodestimator.Thestandardteststatisticforassessingthevalidityofasetoflinearrestrictionsinthelinearmodel,Rβ−q=0,istheFratio,(ee−ee)/J(Rb−q)[Rs2(XX)−1R]−1(Rb−q)∗∗F[J,n−K]==.ee/(n−K)JWithnormallydistributeddisturbances,theFtestisvalidinanysamplesize.Thereremainsaproblemwithnonlinearrestrictionsoftheformc(β)=0,sincethecounter-parttoF,whichwewillexaminehere,hasvalidityonlyasymptoticallyevenwithnor-mallydistributeddisturbances.Inthissection,wewillreconsidertheWaldstatisticandexaminetworelatedstatistics,thelikelihoodratiostatisticandtheLagrangemultiplierstatistic.Thesestatisticsarebothbasedonthelikelihoodfunctionand,liketheWaldstatistic,aregenerallyvalidonlyasymptotically.Nosimplicityisgainedbyrestrictingourselvestolinearrestrictionsatthispoint,sowewillconsidergeneralhypothesesoftheformH0:c(β)=0,H1:c(β)=0.TheWaldstatisticfortestingthishypothesisanditslimitingdistributionunderH0wouldbe2−1−1d2W=c(b){C(b)[σˆ(XX)]C(b)}c(b)−→χ[J],(17-40)whereC(b)=[∂c(b)/∂b].(17-41)Thelikelihoodratio(LR)testiscarriedoutbycomparingthevaluesofthelog-likelihoodfunctionwithandwithouttherestrictionsimposed.Weleaveasideforthepresenthowtherestrictedestimatorb∗iscomputed(exceptforthelinearmodel,whichwesawearlier).Theteststatisticandit’slimitingdistributionunderH0ared2LR=−2[lnL∗−lnL]−→χ[J].(17-42)Thelog-likelihoodfortheregressionmodelisgivenin(17-34).Thefirst-orderconditionsimplythatregardlessofhowtheslopesarecomputed,theestimatorofσ2without\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation495restrictionsonβwillbeσˆ2=(y−Xb)(y−Xb)/nandlikewiseforarestrictedestimatorσˆ2=(y−Xb)(y−Xb)/n=ee/n.Theconcentratedlog-likelihood13willbe∗∗∗∗∗nlnL=−[1+ln2π+ln(ee/n)]c2andlikewisefortherestrictedcase.IfweinserttheseinthedefinitionofLR,thenweobtainLR=nln[ee/ee]=n(lnσˆ2−lnσˆ2)=nln(σˆ2/σˆ2).(17-43)∗∗∗∗TheLagrangemultiplier(LM)testisbasedonthegradientofthelog-likelihoodfunction.Theprincipleofthetestisthatifthehypothesisisvalid,thenattherestrictedestimator,thederivativesofthelog-likelihoodfunctionshouldbeclosetozero.TherearetwowaystocarryouttheLMtest.Thelog-likelihoodfunctioncanbemaximizedsubjecttoasetofrestrictionsbyusingn[(y−Xβ)(y−Xβ)]/nlnL=−ln2π+lnσ2++λc(β).LM22σThefirst-orderconditionsforasolutionare∂lnLLMX(y−Xβ)+C(β)λ∂βσ20∂lnLLM2=−n+(y−Xβ)(y−Xβ)=0.(17-44)∂σ2σ22σ40∂lnLLMc(β)∂λThesolutionstotheseequationsgivetherestrictedleastsquaresestimator,b∗;theusualvarianceestimator,nowee/n;andtheLagrangemultipliers.Therearenowtwoways∗∗tocomputetheteststatistic.Inthesettingoftheclassicallinearregressionmodel,whenweactuallycomputetheLagrangemultipliers,aconvenientwaytoproceedistotestthehypothesisthatthemultipliersequalzero.Forthismodel,thesolutionforλ∗isλ∗=[R(XX)−1R]−1(Rb−q).Thisequationisalinearfunctionoftheleastsquaresestimator.IfwecarryoutaWaldtestofthehypothesisthatλ∗equals0,thenthestatisticwillbeLM=λ{Est.Var[λ]}−1λ=(Rb−q)[Rs2(XX)−1R]−1(Rb−q).(17-45)∗∗∗∗Thedisturbancevarianceestimator,s2,basedontherestrictedslopesisee/n.∗∗∗AnalternativewaytocomputetheLMstatisticoftenproducesinterestingresults.Inmostsituations,wemaximizethelog-likelihoodfunctionwithoutactuallycomputingthevectorofLagrangemultipliers.(Therestrictionsareusuallyimposedsomeotherway.)Analternativewaytocomputethestatisticisbasedonthe(general)resultthatunderthehypothesisbeingtested,E[∂lnL/∂β]=E[(1/σ2)Xε]=0andAsy.Var[∂lnL/∂β]=−E[∂2lnL/∂β∂β]−1=σ2(XX)−1.14(17-46)13SeeSectionE.6.3.14ThismakesuseofthefactthattheHessianisblockdiagonal.\nGreene-50240bookJune26,200215:8496CHAPTER17✦MaximumLikelihoodEstimationWecantestthehypothesisthatattherestrictedestimator,thederivativesareequaltozero.ThestatisticwouldbeeX(XX)−1Xe∗∗2LM==nR∗.(17-47)ee∗/n∗Inthisform,theLMstatisticisntimesthecoefficientofdeterminationinaregressionoftheresidualse=(y−xb)onthefullsetofregressors.i∗ii∗WithsomemanipulationwecanshowthatW=[n/(n−K)]JFandLRandLMareapproximatelyequaltothisfunctionofF.15AllthreestatisticsconvergetoJFasnincreases.ThelinearmodelisaspecialcaseinthattheLRstatisticisbasedonlyontheunrestrictedestimatoranddoesnotactuallyrequirecomputationoftherestrictedleastsquaresestimator,althoughcomputationofFdoesinvolvemostofthecomputationofb∗.Sincethelogfunctionisconcave,andW/n≥ln(1+W/n),Godfrey(1988)alsoshowsthatW≥LR≥LM,soforthelinearmodel,wehaveafirmrankingofthethreestatistics.Thereisampleevidencethattheasymptoticresultsforthesestatisticsareproblem-aticinsmallormoderatelysizedsamples.[See,e.g.,DavidsonandMacKinnon(1993,pp.456–457).]Thetruedistributionsofallthreestatisticsinvolvethedataandtheun-knownparametersand,assuggestedbythealgebra,convergetotheFdistributionfromabove.Theimplicationisthatcriticalvaluesfromthechi-squareddistributionarelikelytobetoosmall;thatis,usingthelimitingchi-squareddistributioninsmallormoderatelysizedsamplesislikelytoexaggeratethesignificanceofempiricalresults.Thus,inapplications,themoreconservativeFstatistic(ortforonerestriction)islikelytobepreferableunlessone’sdataareplentiful.17.6.2MAXIMUMLIKELIHOODESTIMATIONOFNONLINEARREGRESSIONMODELSInChapter9,weconsiderednonlinearregressionmodelsinwhichthenonlinearityintheparametersappearedentirelyontheright-handsideoftheequation.Therearemodelsinwhichparametersappearnonlinearlyinfunctionsofthedependentvariableaswell.Supposethat,ingeneral,themodelisg(yi,θ)=h(xi,β)+εi.Oneapproachtoestimationwouldbeleastsquares,minimizingnS(θ,β)=[g(y,θ)−h(x,β)]2.iii=1Thereisnoreasontoexpectthisnonlinearleastsquaresestimatortobeconsistent,how-ever,thoughitisdifficulttoshowthisanalytically.TheproblemisthatnonlinearleastsquaresignorestheJacobianofthetransformation.DavidsonandMacKinnon(1993,p.244)suggestaqualitativeargument,whichwecanillustratewithanexample.Supposeyispositive,g(y,θ)=exp(θy)andh(x,β)=βx.Inthiscase,anobvious“solution”is15SeeGodfrey(1988,pp.49–51).\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation497β=0andθ→−∞,whichproducesasumofsquaresofzero.“Estimation”becomesanonissue.Forthistypeofregressionmodel,however,maximumlikelihoodestimationisconsistent,efficient,andgenerallynotappreciablymoredifficultthanleastsquares.Fornormallydistributeddisturbances,thedensityofyiis∂εi2−1/2−[g(yi,θ)−h(xi,β)]2/(2σ2)f(yi)=(2πσ)e.∂yiTheJacobianofthetransformation[see(3-41)]is∂εi∂g(yi,θ)J(yi,θ)===Ji.∂yi∂yiAftercollectingterms,thelog-likelihoodfunctionwillbennn212i=1[g(yi,θ)−h(xi,β)]lnL=−[ln2π+lnσ]+lnJ(yi,θ)−.(17-48)22σ2i=1i=1Inmanycases,includingtheapplicationsconsideredhere,thereisaninconsistencyinthemodelinthatthetransformationofthedependentvariablemayruleoutsomevalues.Hence,theassumednormalityofthedisturbancescannotbestrictlycorrect.Inthegeneralizedproductionfunction,thereisasingularityatyi=0wheretheJacobianbecomesinfinite.Someresearchhasbeendoneonspecificmodificationsofthemodeltoaccommodatetherestriction[e.g.,Poirier(1978)andPoirierandMelino(1978)],butinpractice,thetypicalapplicationinvolvesdataforwhichtheconstraintisinconsequential.ButfortheJacobians,nonlinearleastsquareswouldbemaximumlikelihood.IftheJacobiantermsinvolveθ,however,thenleastsquaresisnotmaximumlikelihood.Asregardsσ2,thislikelihoodfunctionisessentiallythesameasthatforthesimplernonlinearregressionmodel.Themaximumlikelihoodestimatorofσ2willbe1n1nσˆ2=[g(y,θˆ)−h(x,βˆ)]2=e2.(17-49)iiinni=1i=1Thelikelihoodequationsfortheunknownparametersaren1εi∂h(xi,β)∂lnL2σ∂β∂βi=1nn0∂lnL1∂Ji1∂g(yi,θ).(17-50)=−εi=0∂θJi∂θσ2∂θi=1i=10∂lnLn−n12∂σ22σ2+2σ4εii=1Theseequationswillusuallybenonlinear,soasolutionmustbeobtainediteratively.Onespecialcasethatiscommonisamodelinwhichθisasingleparameter.Givenaparticularvalueofθ,wewouldmaximizelnLwithrespecttoβbyusingnonlinearleastsquares.[Itwouldbesimpleryetif,inaddition,h(xi,β)werelinearsothatwecoulduselinearleastsquares.Seethefollowingapplication.]Therefore,awaytomaximizeLforalltheparametersistoscanovervaluesofθfortheonethat,withtheassociatedleastsquaresestimatesofβandσ2,givesthehighestvalueoflnL.(Ofcourse,thisrequiresthatweknowroughlywhatvaluesofθtoexamine.)\nGreene-50240bookJune26,200215:8498CHAPTER17✦MaximumLikelihoodEstimationIfθisavectorofparameters,thendirectmaximizationofLwithrespecttothefullsetofparametersmaybepreferable.(MethodsofmaximizationarediscussedinAppendixE.)Thereisanadditionalsimplificationthatmaybeuseful.Whateverval-uesareultimatelyobtainedfortheestimatesofθandβ,theestimateofσ2willbegivenby(17-49).Ifweinsertthissolutionin(17-48),thenweobtaintheconcentratedlog-likelihood,nnn1nlnL=lnJ(y,θ)−[1+ln(2π)]−lnε2.(17-51)cii22ni=1i=1Thisequationisafunctiononlyofθandβ.Wecanmaximizeitwithrespecttoθandβandobtaintheestimateofσ2asaby-product.(SeeSectionE.6.3fordetails.)Anestimateoftheasymptoticcovariancematrixofthemaximumlikelihoodesti-matorscanbeobtainedbyinvertingtheestimatedinformationmatrix.Itisquitelikely,however,thattheBerndtetal.(1974)estimatorwillbemucheasiertocompute.Thelogofthedensityfortheithobservationistheithtermin(17-50).ThederivativesoflnLiwithrespecttotheunknownparametersare2∂lnLi/∂β(εi/σ)[∂h(xi,β)/∂β]2gi=∂lnLi/∂θ=(1/Ji)[∂Ji/∂θ]−(εi/σ)[∂g(yi,θ)/∂θ].(17-52)∂lnL/∂σ2(1/(2σ2))ε2/σ2−1iiTheasymptoticcovariancematrixforthemaximumlikelihoodestimatorsisestimatedusing−1nEst.Asy.Var[MLE]=gˆgˆ=(GˆGˆ)−1.(17-53)ii=1Notethattheprecedingincludesofarowandacolumnforσ2inthecovariancematrix.Inamodelthattransformsyaswellasx,theHessianofthelog-likelihoodisgenerallynotblockdiagonalwithrespecttoθandσ2.Whenyistransformed,themaximumlikelihoodestimatorsofθandσ2arepositivelycorrelated,becausebothparametersreflectthescalingofthedependentvariableinthemodel.Thisresultmayseemcounterintuitive.Considerthedifferenceinthevarianceestimatorsthatariseswhenalinearandaloglinearmodelareestimated.Thevarianceoflnyarounditsmeanisobviouslydifferentfromthatofyarounditsmean.Bycontrast,considerwhathappenswhenonlytheindependentvariablesaretransformed,forexample,bytheBox–Coxtransformation.Theslopeestimatorsvaryaccordingly,butinsuchawaythatthevarianceofyarounditsconditionalmeanwillstayconstant.16Example17.5AGeneralizedProductionFunctionTheCobb–Douglasfunctionhasoftenbeenusedtostudyproductionandcost.Amongtheassumptionsofthismodelisthattheaveragecostofproductionincreasesordecreasesmonotonicallywithincreasesinoutput.ThisassumptionisindirectcontrasttothestandardtextbooktreatmentofaU-shapedaveragecostcurveaswellastoalargeamountofempiricalevidence.(SeeExample7.3forawell-knownapplication.)Torelaxthisassumption,Zellner16SeeSeaksandLayson(1983).\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation499TABLE17.2GeneralizedProductionFunctionEstimatesMaximumLikelihoodEstimateSE(1)SE(2)NonlinearLeastSquaresβ12.9148220.449120.125342.108925β20.3500680.100190.0943540.257900β31.0922750.160700.114980.878388θ0.1066660.078702−0.031634σ20.04274270.0151167εε1.0685670.7655490lnL−8.939044−13.621256andRevankar(1970)proposedageneralizationoftheCobb–Douglasproductionfunction.17Theirmodelallowseconomiesofscaletovarywithoutputandtoincreaseandthendecreaseasoutputrises:lny+θy=lnγ+α(1−δ)lnK+αδlnL+ε.Notethattheright-handsideoftheirmodelisintrinsicallylinearaccordingtotheresultsofSection7.3.3.Themodelasawhole,however,isintrinsicallynonlinearduetotheparametrictransformationofyappearingontheleft.ForZellnerandRevankar’sproductionfunction,theJacobianofthetransformationfromεitoyiis∂εi/∂yi=(θ+1/yi).Somesimplificationisachievedbywritingthisas(1+θyi)/yi.Thelog-likelihoodisthennnnnn122lnL=ln(1+θyi)−lnyi−ln(2π)−lnσ−2εi,222σi=1i=1i=1whereεi=(lnyi+θyi−β1−β2lncapitali−β3lnlabori).Estimationofthismodelisstraight-forward.Foragivenvalueofθ,βandσ2areestimatedbylinearleastsquares.Therefore,toestimatethefullsetofparameters,wecouldscanovertherangeofzerotooneforθ.Thevalueofθthat,withitsassociatedleastsquaresestimatesofβandσ2,maximizesthelog-likelihoodfunctionprovidesthemaximumlikelihoodestimate.ThisprocedurewasusedbyZellnerandRevankar.TheresultsgiveninTable17.2wereobtainedbymaximizingthelog-likelihoodfunctiondirectly,instead.Thestatewidedataonoutput,capital,labor,andnumberofestablishmentsinthetransportationindustryusedinZellnerandRevankar’sstudyaregiveninAppendixTableF9.2andExample16.6.Forthisapplication,y=valueaddedperfirm,K=capitalperfirm,andL=laborperfirm.MaximumlikelihoodandnonlinearleastsquaresestimatesareshowninTable17.2.TheasymptoticstandarderrorsforthemaximumlikelihoodestimatesarelabeledSE(1).ThesearecomputedusingtheBHHHformoftheasymptoticcovariancematrix.Thesecondset,SE(2),arecomputedtreatingtheestimateofθasfixed;theyaretheusuallinearleastsquaresresultsusing(lny+θy)asthedependentvariableinalinearregression.Clearly,theseresultswouldbeverymisleading.ThefinalcolumnofTable10.2liststhesimplenonlinearleastsquaresestimates.Nostandarderrorsaregiven,becausethereisnoappropriateformulaforcomputingtheasymptoticcovariancematrix.Thesumofsquaresdoesnotprovideanappropriatemethodforcomputingthepseudoregressorsfortheparametersinthetrans-formation.Thelasttworowsofthetabledisplaythesumofsquaresandthelog-likelihoodfunctionevaluatedattheparameterestimates.Asexpected,thelog-likelihoodismuchlargeratthemaximumlikelihoodestimates.Incontrast,thenonlinearleastsquaresestimatesleadtoamuchlowersumofsquares;leastsquaresisstillleastsquares.17Analternativeapproachistomodelcostsdirectlywithaflexiblefunctionalformsuchasthetranslogmodel.ThisapproachisexaminedindetailinChapter14.\nGreene-50240bookJune26,200215:8500CHAPTER17✦MaximumLikelihoodEstimationExample17.6AnLMTestfor(Log-)LinearityAnaturalgeneralizationoftheBox–Coxregressionmodel(Section9.3.2)isy(λ)=βx(λ)+ε.(17-54)wherez(λ)=(zλ−1)/λ.Thisformincludesthelinear(λ=1)andloglinear(λ=0)modelsasspecialcases.TheJacobianofthetransformationis|dε/dy|=yλ−1.Thelog-likelihoodfunctionforthemodelwithnormallydistributeddisturbancesisnnnn21(λ)(λ)2lnL=−2ln(2π)−2lnσ+(λ−1)lnyi−2σ2yi−βxi.(17-55)i=1i=1TheMLEsofλandβarecomputedbymaximizingthisfunction.Theestimatorofσ2isthemeansquaredresidualasusual.Wecanuseaone-dimensionalgridsearchoverλ—foragivenvalueofλ,theMLEofβisleastsquaresusingthetransformeddata.Itmustberemembered,however,thatthecriterionfunctionincludestheJacobianterm.WewillusetheBHHHestimatoroftheasymptoticcovariancematrixforthemaximumlikelihood.Thederivativesoftheloglikelihoodare(λ)∂lnLεixi∂βσ2nε∂y(λ)K∂x(λ)n∂lnLlny−ii−βik=iσ2∂λk∂λ=gi(17-56)∂λi=1k=1i=1∂lnL1ε2i2−1∂σ2σ2σ2where∂[zλ−1]/λλzλlnz−(zλ−1)1λ(λ)==zlnz−z.(17-57)∂λλ2λ(SeeExercise6inChapter9.)Theestimatoroftheasymptoticcovariancematrixforthemaximumlikelihoodestimatorisgivenin(17-53).TheBox–Coxmodelprovidesaframeworkforaspecificationtestoflinearityversuslog-linearity.Toassemblethisresult,considerfirstthebasicmodel(λ)y=f(x,β1,β2,λ)+ε=β1+β2x+ε.Thepseudoregressorsarex∗=1,x∗=x(λ),x∗=β(∂x(λ)/∂λ)asgivenabove.Wenow1232consideraLagrangemultipliertestofthehypothesisthatλequalszero.Thetestiscarriedoutbyfirstregressingyonaconstantandlnx(i.e.,theregressorevaluatedatλ=0)andthencomputingnR2intheregressionoftheresidualsfromthisfirstregressiononx∗,x∗,and∗12x∗,alsoevaluatedatλ=0.Thefirstandsecondoftheseare1andlnx.Toobtainthethird,3werequirex∗=βlim(∂x(λ)/∂λ).ApplyingL’Hopitalˆ’sruletotheright-handsideof3|λ=02λ→0(12-57),differentiatenumeratoranddenominatorwithrespecttoλ.Thisproduces∂x(λ)∂x(λ)11λ2λ22lim=limx(lnx)−=limx(lnx)=(lnx).λ→0∂λλ→0∂λ2λ→02Therefore,limx∗=β[1(lnx)2].TheLagrangemultipliertestiscarriedoutintwosteps.λ→0322First,weregressyonaconstantandlnxandcomputetheresiduals.Second,weregress12theseresidualsonaconstant,lnx,andb2(2lnx),whereb2isthecoefficientonlnxinthefirstregression.TheLagrangemultiplierstatisticisnR2fromthesecondregression.Togeneralizethisproceduretoseveralregressors,wewouldusethelogsofalltheregressorsatthefirststep.Then,theadditionalregressorforthesecondregressionwouldbeKx∗=b(1ln2x),λk2kk=1\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation501wherethesumistakenoverallthevariablesthataretransformedintheoriginalmodelandthebk’saretheleastsquarescoefficientsinthefirstregression.Byextendingthisprocesstothemodelof(17-54),wecandeviseabonafidetestoflog-linearity(againstthemoregeneralmodel,notlinearity).[SeeDavidsonandMacKinnon(1985).Atestoflinearitycanbeconductedusingλ=1,instead.)Computingthevarioustermsatλ=0again,wehaveεˆi=lnyi−βˆ1−βˆ2lnxi,whereasbefore,βˆ1andβˆ2arecomputedbytheleastsquaresregressionoflnyonaconstantandlnx.Letεˆ∗=1ln2y−βˆ(1ln2x).Theni2i22iεˆ/σˆ2i2(lnxi)εˆi/σˆgˆi=.lny−εˆεˆ∗/σˆ2iiıεˆ2/σˆ2−1/(2σˆ2)iIfthereareKregressorsinthemodel,thenthesecondcomponentingˆiwillbeavectorcontainingthelogsofthevariables,whereasεˆ∗inthethirdbecomesiK11εˆ∗=ln2y−βˆln2x.iikik22k=1UsingtheBerndtetal.estimatorgivenin(10-54),wecannowconstructtheLagrangemul-tiplierstatisticas−1nnnLM=χ2[1]=gˆgˆgˆgˆ=iG(GG)−1Gi,iiiii=1i=1i=1whereGisthen×(K+2)matrixwhosecolumnsareg1throughgK+2andiisacolumnof1s.Theusefulnessofthisapproachforeitherofthemodelswehaveexaminedisthatintestingthehypothesis,itisnotnecessarytocomputethenonlinear,unrestricted,Box–Coxregression.17.6.3NONNORMALDISTURBANCES—THESTOCHASTICFRONTIERMODELThisfinalapplicationwillexaminearegressionlikemodelinwhichthedisturbancesdonothaveanormaldistribution.Themodeldevelopedherealsopresentsaconvenientplatformonwhichtoillustratetheuseoftheinvariancepropertyofmaximumlikelihoodestimatorstosimplifytheestimationofthemodel.AlengthyliteraturecommencingwiththeoreticalworkbyKnight(1933),Debreu(1951),andFarrell(1957)andthepioneeringempiricalstudybyAigner,Lovell,andSchmidt(1977)hasbeendirectedatmodelsofproductionthatspecificallyaccountforthetextbookpropositionthataproductionfunctionisatheoreticalideal.18Ify=f(x)definesaproductionrelationshipbetweeninputs,x,andanoutput,y,thenforanygivenx,theobservedvalueofymustbelessthanorequaltof(x).Theimplicationforanempiricalregressionmodelisthatinaformulationsuchasy=h(x,β)+u,umustbenegative.Sincethetheoreticalproductionfunctionisanideal—thefrontierofefficient18AsurveybyGreene(1997b)appearsinPesaranandSchmidt(1997).KumbhakarandLovell(2000)isacomprehensivereferenceonthesubject.\nGreene-50240bookJune26,200215:8502CHAPTER17✦MaximumLikelihoodEstimationproduction—anynonzerodisturbancemustbeinterpretedastheresultofinefficiency.AstrictlyorthodoxinterpretationembeddedinaCobb–Douglasproductionmodelmightproduceanempiricalfrontierproductionmodelsuchaslny=β1+kβklnxk−u,u≥0.ThegammamodeldescribedinExample5.1wasanapplication.One-sideddisturbancessuchasthisonepresentaparticularlydifficultestimationproblem.Theprimarytheoret-icalproblemisthatanymeasurementerrorinlnymustbeembeddedinthedisturbance.Thepracticalproblemisthattheentireestimatedfunctionbecomesaslavetoanysingleerrantlymeasureddatapoint.Aigner,Lovell,andSchmidtproposedinsteadaformulationwithinwhichobserveddeviationsfromtheproductionfunctioncouldarisefromtwosources:(1)productiveinefficiencyaswehavedefineditaboveandthatwouldnecessarilybenegative;and(2)idiosyncraticeffectsthatarespecifictothefirmandthatcouldenterthemodelwitheithersign.Theendresultwaswhattheylabeledthe“stochasticfrontier”:lny=β+βlnx−u+v,u≥0,v∼N0,σ2.1kkkv=β1+kβklnxk+ε.Thefrontierforanyparticularfirmish(x,β)+v,hencethenamestochasticfron-tier.Theinefficiencytermisu,arandomvariableofparticularinterestinthissetting.Sincethedataareinlogterms,uisameasureofthepercentagebywhichtheparticularobservationfailstoachievethefrontier,idealproductionrate.Tocompletethespecification,theysuggestedtwopossibledistributionsfortheinefficiencyterm,theabsolutevalueofanormallydistributedvariableandanexponen-tiallydistributedvariable.ThedensityfunctionsforthesetwocompounddistributionsaregivenbyAigner,Lovell,andSchmidt;letε=v−u,λ=σ/σ,σ=(σ2+σ2)1/2,uvuvand(z)=theprobabilitytotheleftofzinthestandardnormaldistribution[seeSectionsB.4.1andE.5.6].Forthe“half-normal”model,2121εi−εiλlnh(εi|β,λ,σ)=−lnσ−log−+ln,2π2σσwhereasfortheexponentialmodel122εilnh(εi|β,θ,σv)=lnθ+θσv+θεi+ln−−θσv.2σvBoththesedistributionsareasymmetric.Wethushavearegressionmodelwithanonnormaldistributionspecifiedforthedisturbance.Thedisturbance,ε,hasanonzeromeanaswell;E[ε]=−σ(2/π)1/2forthehalf-normalmodeland−1/θfortheexpo-unentialmodel.Figure17.3illustratesthedensityforthehalf-normalmodelwithσ=1andλ=2.Bywritingβ=β+E[ε]andε∗=ε−E[ε],weobtainamoreconventional01formulationlny=β+βlnx+ε∗0kkkwhichdoeshaveadisturbancewithazeromeanbutanasymmetric,nonnormaldistribu-tion.Theasymmetryofthedistributionofε∗doesnotnegateourbasicresultsforleastsquaresinthisclassicalregressionmodel.Thismodelsatisfiestheassumptionsofthe\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation503ProbabilityDensityfortheStochasticFrontier.70.56.42Density.28.14.004.02.81.6.4.82.0FIGURE17.3DensityfortheDisturbanceintheStochasticFrontierModel.Gauss–Markovtheorem,soleastsquaresisunbiasedandconsistent(saveforthecon-stantterm),andefficientamonglinearunbiasedestimators.Inthismodel,however,themaximumlikelihoodestimatorisnotlinear,anditismoreefficientthanleastsquares.Wewillworkthroughmaximumlikelihoodestimationofthehalf-normalmodelindetailtoillustratethetechnique.Theloglikelihoodisn2nn21εi−εiλlnL=−nlnσ−ln−+ln.2π2σσi=1i=1Thisisnotaparticularlydifficultlog-likelihoodtomaximizenumerically.Nonetheless,itisinstructivetomakeuseofaconveniencethatwenotedearlier.Recallthatmaximumlikelihoodestimatorsareinvarianttoone-to-onetransformation.Ifweletθ=1/σandγ=(1/σ)β,thelog-likelihoodfunctionbecomesn21nnlnL=nlnθ−ln−(θy−γx)2+ln[−λ(θy−γx)].iiii2π2i=1i=1Asyoucouldverifybytryingthederivations,thistransformationbringsadramaticsimplificationinthemanipulationofthelog-likelihoodanditsderivatives.Wewillmakerepeateduseofthefunctionsα=ε/σ=θy−γx,iiiiφ[−λαi]δ(yi,xi,λ,θ,γ)==δi.[−λαi]i=−δi(−λαi+δi)\nGreene-50240bookJune26,200215:8504CHAPTER17✦MaximumLikelihoodEstimation(ThesecondoftheseisthederivativeofthefunctioninthefinalterminlogL.Thethirdisthederivativeofδiwithrespecttoitsargument;i<0forallvaluesofλαi.)Itwillalsobeconvenienttodefinethe(K+1)×1columnsvectorsz=(x,−y)andiiit=(0,1/θ).Thelikelihoodequationsarei∂lnLnnn∂(γ,θ)=ti+αizi+λδizi=0,i=1i=1i=1∂lnLn=−δiαi=0∂λi=1andthesecondderivativesaren2tt0(λi−1)zizi(δi−λαii)ziiiH(γ,θ,λ)=(δ−λα)zα2−.iiiiii00i=1Theestimatoroftheasymptoticcovariancematrixforthedirectlyestimatedparametersis−1Est.Asy.Var[γˆ,θ,ˆλˆ]=−H[γˆ,θˆ,λˆ].Therearetwosetsoftransformationsoftheparametersinourformulation.Inordertorecoverestimatesoftheoriginalstructuralparametersσ=1/θandβ=γ/θweneedonlytransformtheMLEs.Sincethesetransformationsareonetoone,theMLEsofσandβare1/θˆandγˆ/θ.ˆTocomputeanasymptoticcovariancematrixfortheseestimatorswewillusethedeltamethod,whichwillusethederivativematrix∂βˆ/∂γˆ∂βˆ/∂θ∂ˆβˆ/∂λˆ(1/θ)ˆI−(1/θˆ2)γˆ0G=2∂σ/∂ˆγˆ∂σ/∂ˆθ∂ˆσ/∂ˆλˆ=0−(1/θˆ)0.∂λ/∂ˆγˆ∂λ/∂ˆθ∂ˆλ/∂ˆλˆ001Then,fortherecoveredparameters,we−1Est.Asy.Var[βˆ,σ,ˆλˆ]=G×−H[γˆ,θ,ˆλˆ]×G.Forthehalf-normalmodel,wewouldalsorelyontheinvarianceofmaximumlikelihoodestimatorstorecoverestimatesofthedeepervarianceparameters,σ2=σ2/(1+λ2)vandσ2=σ2λ2/(1+λ2).uThestochasticfrontiermodelisabitdifferentfromthosewehaveanalyzedprevi-ouslyinthatthedisturbanceisthecentralfocusoftheanalysisratherthanthecatchallfortheunknownandunknowablefactorsomittedfromtheequation.Ideally,wewouldliketoestimateuiforeachfirminthesampletocomparethemonthebasisoftheirpro-ductiveefficiency.(Theparametersoftheproductionfunctionareusuallyofsecondaryinterestinthesestudies.)Unfortunately,thedatadonotpermitadirectestimate,sincewithestimatesofβinhand,weareonlyabletocomputeadirectestimateofε=y−xβ.Jondrowetal.(1982),however,havederivedausefulapproximationthatisnowthestandardmeasureinthesesettings,σλφ(z)ελE[u|ε]=−z,z=,1+λ21−(z)σ\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation505TABLE17.3EstimatedStochasticFrontierFunctionsLeastSquaresHalf-NormalModelExponentialModelStandardStandardStandardCoefficientEstimateErrortRatioEstimateErrortRatioEstimateErrortRatioConstant1.8440.2347.8962.0810.4224.9332.0690.2907.135βk0.2450.1072.2970.2590.1441.8000.2620.1202.184βl0.8050.1266.3730.7800.1704.5950.7700.1385.581σ0.2360.2820.0873.237σu—0.2220.136σv—0.1900.1710.0543.170λ—1.2651.6200.781θ—7.3983.9311.882logL2.25372.46952.8605forthehalfnormal-model,andφ(z/σv)2E[u|ε]=z+σv,z=ε−θσv(z/σv)fortheexponentialmodel.Thesevaluescanbecomputedusingthemaximumlikelihoodestimatesofthestructuralparametersinthemodel.Inaddition,astructuralparameterofinterestistheproportionofthetotalvarianceofεthatisduetotheinefficiencyterm.Forthehalf-normalmodel,Var[ε]=Var[u]+Var[v]=(1−2/π)σ2+σ2,whereasforuvtheexponentialmodel,thecounterpartis1/θ2+σ2.vExample17.7StochasticFrontierModelAppendixTableF9.2lists25statewideobservationsusedbyZellnerandRevankar(1970)tostudyproductioninthetransportationequipmentmanufacturingindustry.Wehaveusedthesedatatoestimatethestochasticfrontiermodels.ResultsareshowninTable17.3.19TheJondrow,etal.(1982)estimatesoftheinefficiencytermsarelistedinTable17.4.Theestimatesoftheparametersoftheproductionfunction,β1,β2,andβ3arefairlysimilar,butthevarianceparameters,σuandσv,appeartobequitedifferent.Someoftheparameterdifferenceisillusory,however.Thevariancecomponentsforthehalf-normalmodelare(1−2/π)σ2=u0.0179andσ2=0.0361,whereasthosefortheexponentialmodelare1/θ2=0.0183andvσ2=0.0293.Ineachcase,aboutone-thirdofthetotalvarianceofεisaccountedforbythevvarianceofu.17.6.4CONDITIONALMOMENTTESTSOFSPECIFICATIONAspateofstudieshasshownhowtouseconditionalmomentrestrictionsforspecifica-tiontestingaswellasestimation.20Thelogicoftheconditionalmoment(CM)basedspecificationtestisasfollows.Themodelspecificationimpliesthatcertainmomentre-strictionswillholdinthepopulationfromwhichthedataweredrawn.Ifthespecification19Nisthenumberofestablishmentsinthestate.ZellnerandRevankarusedperestablishmentdataintheirstudy.Thestochasticfrontiermodelhastheintriguingpropertythatiftheleastsquaresresidualsareskewedinthepositivedirection,thenleastsquareswithλ=0maximizesthelog-likelihood.Thisproperty,infact,characterizesthedataabovewhenscaledbyN.Sincethatleavesanotparticularlyinterestingexampleanditdoesnotoccurwhenthedataarenotnormalized,forpurposesofthisillustrationwehaveusedtheunscaleddatatoproduceTable17.3.Wedonotethatthisresultisacommon,vexingoccurrenceinpractice.20See,forexample,PaganandVella(1989).\nGreene-50240bookJune26,200215:8506CHAPTER17✦MaximumLikelihoodEstimationTABLE17.4EstimatedInefficienciesStateHalf-NormalExponentialStateHalf-NormalExponentialAlabama0.20110.1459Maryland0.13530.0925California0.14480.0972Massachusetts0.15640.1093Connecticut0.19030.1348Michigan0.15810.1076Florida0.51750.5903Missouri0.10290.0704Georgia0.10400.0714NewJersey0.09580.0659Illinois0.12130.0830NewYork0.27790.2225Indiana0.21130.1545Ohio0.22910.1698Iowa0.24930.2007Pennsylvania0.15010.1030Kansas0.10100.0686Texas0.20300.1455Kentucky0.05630.0415Virginia0.14000.0968Louisiana0.20330.1507Washington0.11050.0753Maine0.22260.1725WestVirginia0.15560.1124Wisconsin0.14070.0971iscorrect,thenthesampledatashouldmimictheimpliedrelationships.Forexample,intheclassicalregressionmodel,theassumptionofhomoscedasticityimpliesthatthedisturbancevarianceisindependentoftheregressors.Assuch,Ex[(y−βx)2−σ2]=Exε2−σ2=0.iiiiiIf,ontheotherhand,theregressionisheteroscedasticinawaythatdependsonxi,thenthiscovariancewillnotbezero.Ifthehypothesisofhomoscedasticityiscorrect,thenwewouldexpectthesamplecounterparttothemomentcondition,1nr¯=xe2−s2,iini=1whereeiistheOLSresidual,tobeclosetozero.(ThiscomputationappearsinBreuschandPagan’sLMtestforhomoscedasticity.SeeSection11.4.3.)Thepracticalproblemstobesolvedare(1)toformulatesuitablemomentconditionsthatdocorrespondtothehypothesistest,whichisusuallystraightforward;(2)todevisetheappropriatesamplecounterpart;and(3)todeviseasuitablemeasureofclosenesstozeroofthesamplemomentestimator.ThelastofthesewillbeintheframeworkoftheWaldstatisticsthatwehaveexaminedatvariouspointsinthisbook.Sotheproblemwillbetodevisetheappropriatecovariancematrixforthesamplemoments.Considerageneralcaseinwhichthemomentconditioniswrittenintermsofvari-ablesinthemodel[yi,xi,zi]andparameters(asinthelinearregressionmodel)θˆ.Thesamplemomentcanbewritten1n1nr¯=ri(yi,xi,zi,θˆ)=rˆi.(17-58)nni=1i=1Thehypothesisisthatbasedonthetrueθ,E[ri]=0.UnderthenullhypothesisthatE[ri]=0andassumingthatplimθˆ=θandthatacentrallimittheorem(Theorem√D.18orD.19)appliestonr¯(θ)sothat√dnr¯(θ)−→N[0,]\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation507forsomecovariancematrixthatwehaveyettoestimate,itfollowsthattheWaldstatistic,ˆ−1d2nr¯r¯−→χ(J),(17-59)wherethedegreesoffreedomJisthenumberofmomentrestrictionsbeingtestedandˆisanestimateof.Thus,thestatisticcanbereferredtothechi-squaredtable.Itremainstodeterminetheestimatorof.Thefullderivationofisfairlycom-plicated.[SeePaganandVella(1989,pp.S32–S33).]Butwhenthevectorofparameterestimatorsisamaximumlikelihoodestimator,asitwouldbefortheleastsquareses-timatorwithnormallydistributeddisturbancesandformostoftheotherestimatorsweconsider,asurprisinglysimpleestimatorcanbeused.Supposethattheparametervectorusedtocomputethemomentsaboveisobtainedbysolvingtheequations1n1ng(yi,xi,zi,θˆ)=gˆi=0,(17-60)nni=1i=1whereθˆistheestimatedparametervector[e.g.,(βˆ,σ)ˆinthelinearmodel].Forthelinearregressionmodel,thatwouldbethenormalequations11nXe=x(y−xb)=0.iiinni=1LetthematrixGbethen×Kmatrixwithithrowequaltogˆ.Inamaximumlikelihoodiproblem,Gisthematrixofderivativesoftheindividualtermsinthelog-likelihoodfunctionwithrespecttotheparameters.ThisistheGusedtocomputetheBHHHestimatoroftheinformationmatrix.[See(17-18).]LetRbethen×Jmatrixwhoseithrowisrˆ.PaganandVellashowthatformaximumlikelihoodestimators,canbeiestimatedusing1S=[RR−RG(GG)−1GR].21(17-61)nThisequationlookslikeaninvolvedmatrixcomputation,butitissimplewithanyregressionprogram.EachelementofSisthemeansquareorcross-productoftheleastsquaresresidualsinalinearregressionofacolumnofRonthevariablesinG.22Therefore,theoperationalversionofthestatisticis1C=nr¯S−1r¯=iR[RR−RG(GG)−1GR]−1Ri,(17-62)nwhereiisann×1columnofones,which,onceagain,isreferredtotheappropriatecriticalvalueinthechi-squaredtable.Thisresultprovidesajointtestthatallthemomentconditionsaresatisfiedsimultaneously.Anindividualtestofjustoneofthemoment21Itmightbetemptingjusttouse(1/n)RR.Thisideawouldbeincorrect,becauseSaccountsforRbeingafunctionoftheestimatedparametervectorthatisconvergingtoitsprobabilitylimitatthesamerateasthesamplemomentsareconvergingtotheirs.22IftheestimatorisnotanMLE,thenestimationofismoreinvolvedbutalsostraightforwardusingbasicmatrixalgebra.Theadvantageof(17-62)isthatitinvolvessimplesumsofvariablesthathavealreadybeencomputedtoobtainθˆandr¯.Note,aswell,thatifθhasbeenestimatedbymaximumlikelihood,thentheterm(GG)−1istheBHHHestimatoroftheasymptoticcovariancematrixofθˆ.Ifitweremoreconvenient,thenthisestimatorcouldbereplacedwithanyotherappropriateestimatorofAsy.Var[θˆ].\nGreene-50240bookJune26,200215:8508CHAPTER17✦MaximumLikelihoodEstimationrestrictionsinisolationcanbecomputedevenmoreeasilythanajointtest.FortestingoneoftheLconditions,saythethone,thetestcanbecarriedoutbyasimplettestofwhethertheconstanttermiszeroinalinearregressionofthethcolumnofRonaconstanttermandallthecolumnsofG.Infact,theteststatisticin(17-62)couldalsobeobtainedbystackingtheJcolumnsofRandtreatingtheLequationsasaseeminglyunrelatedregressionsmodelwith(i,G)asthe(identical)regressorsineachequationandthentestingthejointhypothesisthatalltheconstanttermsarezero.(SeeSection14.2.3.)Example17.8TestingforHeteroscedasticityintheLinearRegressionModelSupposethatthelinearmodelisspecifiedasyi=β1+β2xi+β3zi+εi.Totestwhether222Ezε−σ=0,iiwelinearlyregressz2(e2−s2)onaconstant,e,xe,andze.Astandardttestofwhetheriiiiiiitheconstantterminthisregressioniszerocarriesoutthetest.Totestthejointhypothesisthatthereisnoheteroscedasticitywithrespecttobothxandz,wewouldregressbothx2(e2−s2)andz2(e2−s2)on[1,e,xe,ze]andcollectthetwocolumnsofresidualsinV.iiiiiiiiiiThenS=(1/n)VV.Themomentvectorwouldben1xi22r¯=e−s.inzii=1Theteststatisticwouldnowbe−1−11C=nr¯Sr¯=nr¯VVr¯.nWewillexamineotherconditionalmomenttestsusingthismethodinSection22.3.4wherewestudythespecificationofthecensoredregressionmodel.17.7TWO-STEPMAXIMUMLIKELIHOODESTIMATIONTheappliedliteraturecontainsalargeandincreasingnumberofmodelsinwhichonemodelisembeddedinanother,whichproduceswhatarebroadlyknownas“two-step”estimationproblems.Consideran(admittedlycontrived)exampleinwhichwehavethefollowing.Model1.Expectednumberofchildren=E[y1|x1,θ1].Model2.Decisiontoenrollinjobtraining=y2,afunctionofx2,θ2,E[y1|x1,θ1].Therearetwoparametervectors,θ1andθ2.Thefirstappearsinthesecondmodel,althoughnotthereverse.Insuchasituation,therearetwowaystoproceed.Fullin-formationmaximumlikelihood(FIML)estimationwouldinvolveformingthejointdistributionf(y1,y2|x1,x2,θ1,θ2)ofthetworandomvariablesandthenmaximizing\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation509thefulllog-likelihoodfunction,nlnL=f(yi1,yi2|xi1,xi2,θ1,θ2).i=1Asecond,ortwo-step,limitedinformationmaximumlikelihood(LIML)procedureforthiskindofmodelcouldbedonebyestimatingtheparametersofmodel1,sinceitdoesnotinvolveθ2,andthenmaximizingaconditionallog-likelihoodfunctionusingtheestimatesfromStep1:nlnLˆ=f[yi2|xi2,θ2,(xi1,θˆ1)].i=1Thereareatleasttworeasonsonemightproceedinthisfashion.First,itmaybestraight-forwardtoformulatethetwoseparatelog-likelihoods,butverycomplicatedtoderivethejointdistribution.Thissituationfrequentlyariseswhenthetwovariablesbeingmod-eledarefromdifferentkindsofpopulations,suchasonediscreteandonecontinuous(whichisaverycommoncaseinthisframework).Thesecondreasonisthatmaximizingtheseparatelog-likelihoodsmaybefairlystraightforward,butmaximizingthejointlog-likelihoodmaybenumericallycomplicatedordifficult.23Wewillconsiderafewexamples.AlthoughwewillencounterFIMLproblemsatvariouspointslaterinthebook,fornowwewillpresentsomebasicresultsfortwo-stepestimation.Proofsoftheresultsgivenherecanbefoundinanimportantreferenceonthesubject,MurphyandTopel(1985).Suppose,then,thatourmodelconsistsofthetwomarginaldistributions,f1(y1|x1,θ1)andf2(y2|x1,x2,θ1,θ2).Estimationproceedsintwosteps.1.Estimateθ1bymaximumlikelihoodinModel1.Let(1/n)Vˆ1bentimesanyoftheestimatorsoftheasymptoticcovariancematrixofthisestimatorthatwerediscussedinSection17.4.6.2.Estimateθ2bymaximumlikelihoodinmodel2,withθˆ1insertedinplaceofθ1asifitwereknown.Let(1/n)Vˆ2bentimesanyappropriateestimatoroftheasymptoticcovariancematrixofθˆ2.Theargumentforconsistencyofθˆ2isessentiallythatifθ1wereknown,thenallourresultsforMLEswouldapplyforestimationofθ2,andsinceplimθˆ1=θ1,asymptotically,thislineofreasoningiscorrect.Butthesamelineofreasoningisnotsufficienttojustifyusing(1/n)Vˆ2astheestimatoroftheasymptoticcovariancematrixofθˆ2.Somecorrectionisnecessarytoaccountforanestimateofθ1beingusedinestimationofθ2.Theessentialresultisthefollowing.23Thereisathirdpossiblemotivation.Ifeithermodelismisspecified,thentheFIMLestimatesofbothmodelswillbeinconsistent.Butifonlythesecondismisspecified,atleastthefirstwillbeestimatedconsistently.Ofcourse,thisresultisonly“halfaloaf,”butitmaybebetterthannone.\nGreene-50240bookJune26,200215:8510CHAPTER17✦MaximumLikelihoodEstimationTHEOREM17.8AsymptoticDistributionoftheTwo-StepMLE[MurphyandTopel(1985)]Ifthestandardregularityconditionsaremetforbothlog-likelihoodfunctions,thenthesecond-stepmaximumlikelihoodestimatorofθ2isconsistentandasymptoti-callynormallydistributedwithasymptoticcovariancematrix1V∗=V+V[CVC−RVC−CVR]V,2221112nwhere√V1=Asy.Var[n(θˆ1−θ1)]basedonlnL1,√V2=Asy.Var[n(θˆ2−θ2)]basedonlnL2|θ1,1∂lnL2∂lnL21∂lnL2∂lnL1C=E,R=E.n∂θ2∂θ1n∂θ2∂θ1Thecorrectionoftheasymptoticcovariancematrixatthesecondsteprequiressomeadditionalcomputation.MatricesV1andV2areestimatedbytherespectiveuncorrectedcovariancematrices.Typically,theBHHHestimators,n−1Vˆ1∂lnfi1∂lnfi11=n∂θˆ1∂θˆ1i=1andn−1Vˆ1∂lnfi2∂lnfi22=n∂θˆ∂θˆi=122areused.ThematricesRandCareobtainedbysummingtheindividualobser-vationsonthecrossproductsofthederivatives.TheseareestimatedwithnCˆ=1∂lnfi2∂lnfi2n∂θˆ∂θˆi=121andnRˆ=1∂lnfi2∂lnfi1n∂θˆ∂θˆi=121Example17.9Two-StepMLEstimationContinuingtheexamplediscussedatthebeginningofthissection,wesupposethatyi2isabinaryindicatorofthechoicewhethertoenrollintheprogram(yi2=1)ornot(yi2=0)andthattheprobabilitiesofthetwooutcomesareexi2β+γE[yi1|xi1]Prob[yi2=1|xi1,xi2]=1+exi2β+γE[yi1|xi1]\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation511andProb[yi2=0|xi1,xi2]=1−Prob[yi2=1|xi1,xi2],wherexi2issomecovariatesthatmightinfluencethedecision,suchasmaritalstatusorageandxi1aredeterminantsoffamilysize.Thissetupisalogitmodel.WewilldevelopthismodelmorefullyinChapter21.Theexpectedvalueofyi1appearsintheprobability.(Remark:Theexpected,ratherthantheactualvaluewaschosendeliberately.Otherwise,themodelswoulddiffersubstantially.Inourcase,wemightviewthedifferenceasthatbetweenanexantedecisionandanexpostone.)SupposethatthenumberofchildrencanbedescribedbyaPoissondistribution(seeSectionB.4.8)dependentonsomevariablesxi1suchaseducation,age,andsoon.Thene−λiλjiProb[yi1=j|xi1]=,j=0,1,...,j!andsuppose,asiscustomary,thatE[yi1]=λi=exp(xi1δ).Themodelsinvolveθ=[δ,β,γ],whereθ1=δ.Infact,itisunclearwhatthejointdistri-butionofy1andy2mightbe,buttwo-stepestimationisstraightforward.Formodel1,thelog-likelihoodanditsfirstderivativesarenlnL1=lnf1(yi1|xi1,δ)i=1nn=[−λi+yi1lnλi−lnyi1!]=[−exp(xi1δ)+yi1(xi1δ)−lnyi1!],i=1i=1nn∂lnL1=(yi1−λi)xi1=uixi1.∂δi=1i=1ComputationoftheestimatesisdevelopedinChapter21.AnyofthethreeestimatorsofV1isalsoeasytocompute,buttheBHHHestimatorismostconvenient,soweuse−1n1Vˆ=uˆ2xx.1nii1i1i=1[Inthisandthesucceedingsummations,weareactuallyestimatingexpectationsofthevariousmatrices.]Wecanwritethedensityfunctionforthesecondmodelasf(y|x,x,β,γ,δ)=Pyi2×(1−P)1−yi2,2i2i1i2iiwherePi=Prob[yi2=1|xi1,xi2]asgivenearlier.ThennlnL2=yi2lnPi+(1−yi2)ln(1−Pi).i=1Forconvenience,letxˆ∗=[x,exp(xδˆ)],andrecallthatθ=[β,γ].Theni2i2i12n∗∗∗lnLˆ2=yi2[xˆi2θ2−ln(1+exp(xˆi2θ2))]+(1−yi2)[−ln(1+exp(xˆi2θ2))].i=1So,atthesecondstep,wecreatetheadditionalvariable,appendittoxi2,andestimatethelogitmodelasifδ(andthisadditionalvariable)wereactuallyobservedinsteadofestimated.Themaximumlikelihoodestimatesof[β,γ]areobtainedbymaximizingthisfunction.(See\nGreene-50240bookJune26,200215:8512CHAPTER17✦MaximumLikelihoodEstimationChapter21.)Afterabitofmanipulation,wefindtheconvenientresultthatnn∂lnLˆ2∗∗=(yi2−Pi)xˆi2=vixˆi2.∂θ2i=1i=1Onceagain,anyofthethreeestimatorscouldbeusedforestimatingtheasymptoticcovari-ancematrix,buttheBHHHestimatorisconvenient,soweuse−1nVˆ12∗∗2=vˆixˆi2xˆi2.ni=1Forthefinalstep,wemustcorrecttheasymptoticcovariancematrixusingCˆandRˆ.Whatremainstoderive—thefewlinesareleftforthereader—is∂lnLn2=vi[γexp(xi1δ)]xi1.∂δi=1So,usingourestimates,nnCˆ=12δˆ)]xˆ∗1∗vˆi[exp(xi1i2xi1,andRˆ=uˆivˆixˆi2xi1.nni=1i=1Wecannowcomputethecorrection.Inmanyapplications,thecovarianceofthetwogradientsRconvergestozero.Whenthefirstandsecondstepestimatesarebasedondifferentsamples,Risexactlyzero.Forn∗example,inourapplicationabove,R=i=1uivixi2xi1.Thetwo“residuals,”uandv,maywellbeuncorrelated.Thisassumptionmustbecheckedonamodel-by-modelbasis,butinsuchaninstance,thethirdandfourthtermsinV∗vanishasymptoticallyandwhat2remainsisthesimpleralternative,V∗∗=(1/n)[V+VCVCV].22212Wewillexaminesomeadditionalapplicationsofthistechnique(includinganempiricalimplementationoftheprecedingexample)laterinthebook.Perhapsthemostcom-monapplicationoftwo-stepmaximumlikelihoodestimationinthecurrentliterature,especiallyinregressionanalysis,involvesinsertingapredictionofonevariableintoafunctionthatdescribesthebehaviorofanother.17.8MAXIMUMSIMULATEDLIKELIHOODESTIMATIONThetechniqueofmaximumsimulatedlikelihood(MSL)isessentiallyaclassicalsam-plingtheorycounterparttothehierarchicalBayesianestimatorweconsideredinSec-tion16.2.4.SincethecelebratedpaperofBerry,Levinsohn,andPakes(1995),andarelatedliteratureadvocatedbyMcFaddenandTrain(2000),maximumsimulatedlike-lihoodestimationhasbeenusedinalargeandgrowingnumberofstudiesbasedonlog-likelihoodsthatinvolveintegralsthatareexpectations.24Inthissection,wewilllayoutsomegeneralresultsforMSLestimationbydevelopingaparticularapplication,24AmajorreferenceforthissetoftechniquesisGourierouxandMonfort(1996).\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation513therandomparametersmodel.Thisgeneralmodelingframeworkhasbeenusedinthemajorityofthereceivedapplications.Wewillthencontinuetheapplicationtothedis-cretechoicemodelforpaneldatathatwebeganinSection16.2.4.Thedensityofyitwhentheparametervectorisβiisf(yit|xit,βi).Theparametervectorβiisrandomlydistributedoverindividualsaccordingtoβi=β+zi+viwhereβ+ziisthemeanofthedistribution,whichdependsontimeinvariantindividualcharacteristicsaswellasparametersyettobeestimated,andtherandomvariationcomesfromtheindividualheterogeneity,vi.Thisrandomvectorisassumedtohavemeanzeroandcovariancematrix,.Theconditionaldensityoftheparametersisdenotedg(βi|zi,β,,)=g(vi+β+zi,),whereg(.)istheunderlyingmarginaldensityoftheheterogeneity.FortheTobserva-tionsingroupi,thejointconditionaldensityisTf(yi|Xi,βi)=f(yit|xit,βi).t=1Theunconditionaldensityforyiisobtainedbyintegratingoverβi,f(yi|Xi,zi,β,,)=Eβi[f(yi|Xi,βi)]=f(yi|Xi,βi)g(βi|zi,β,,)dβi.βiCollectingterms,andmakingthetransformationfromvitoβi,thetruelog-likelihoodwouldbenTlnL=lnf(yit|xit,β+zi+vi)g(vi|)dvii=1vit=1n=lnf(yi|Xi,β+zi+vi)g(vi|)dvi.i=1viEachofthentermsinvolvesanexpectationovervi.Theendresultoftheintegrationisafunctionof(β,,)whichisthenmaximized.Asinthepreviousapplications,itwillnotbepossibletomaximizethelog-likelihoodinthisformbecausethereisnoclosedformfortheintegral.Wehaveconsideredtwoapproachestomaximizingsuchalog-likelihood.Inthelatentclassformulation,itisassumedthattheparametervectortakesoneofadiscretesetofvalues,andthelog-likelihoodismaximizedoverthisdiscretedistributionaswellasthestructuralparame-ters.(SeeSection16.2.3.)ThehierarchicalBayesprocedureusedMarkovChain–MonteCarlomethodstosamplefromthejointposteriordistributionoftheunderlyingparam-etersandusedtheempiricalmeanofthesampleofdrawsastheestimator.Wenowconsiderathirdapproachtoestimatingtheparametersofamodelofthisform,maxi-mumsimulatedlikelihoodestimation.Thetermsinthelog-likelihoodareeachoftheformlnLi=Evi[f(yi|Xi,β+zi+vi)].Asnoted,wedonothaveaclosedformforthisfunction,sowecannotcomputeitdirectly.Supposewecouldsamplerandomlyfromthedistributionofvi.Ifanappropriatelaw\nGreene-50240bookJune26,200215:8514CHAPTER17✦MaximumLikelihoodEstimationoflargenumberscanbeapplied,then1Rlimf(yi|Xi,β+zi+vir)=Evi[f(yi|Xi,β+zi+vi)]R→∞Rr=1whereviristherthrandomdrawfromthedistribution.Thissuggestsastrategyforcomputingthelog-likelihood.Wecansubstitutethisapproximationtotheexpectationintothelog-likelihoodfunction.Withsufficientrandomdraws,theapproximationcanbemadeasclosetothetruefunctionasdesired.[ThetheoryforthisapproachisdiscussedinGourierouxandMonfort(1996),Bhat(1999),andTrain(1999,2002).PracticaldetailsonapplicationsofthemethodaregiveninGreene(2001).]Adetailtoaddconcernshowtosamplefromthedistributionofvi.Therearemanypossibilities,butfornow,weconsiderthesimplestcase,themultivariatenormaldistribution.WriteintheCholeskyform=LLwhereLisalowertriangularmatrix.Now,letubeavectorirofKindependentdrawsfromthestandardnormaldistribution.Thenadrawfromthemultivariatedistributionwithcovariancematrixissimplyvir=Luir.Thesimulatedlog-likelihoodisn1RTlnLS=lnf(yit|xit,β+zi+Luir).Ri=1r=1t=1Theresultingfunctionismaximizedwithrespecttoβ,andL.Thisisobviouslynotasimplecalculation,butitisfeasible,andmucheasierthantryingtomanipulatetheintegralsdirectly.Infact,formostproblemstowhichthismethodhasbeenapplied,thecomputationsaresurprisinglysimple.Theintricatepartisobtainingthefunctionanditsderivatives.But,thefunctionsareusuallyindexfunctionmodelsthatinvolvexβitiwhichgreatlysimplifiesthederivations.Inferenceinthissettingdoesnotinvolveanynewresults.Theestimatedasymp-toticcovariancematrixfortheestimatedparametersiscomputedbymanipulatingthederivativesofthesimulatedlog-likelihood.TheWaldandlikelihoodratiostatisticsarealsocomputedthewaytheywouldusuallybe.Asbefore,weareinterestedinestimatingpersonspecificparameters.Apriorestimatemightsimplyuseβ+zi,butthiswouldnotusealltheinformationinthesample.AposteriorestimatewouldcomputeRr=1βˆirf(yi|Xi,βˆir)Eˆvi[βi|β,,zi,]=R,βˆir=βˆ+ˆzi+Luˆir.r=1f(yi|Xi,βˆir)MechanicaldetailsoncomputingtheMSLEareomitted.TheinterestedreaderisreferredtoGourierouxandMonfort(1996),Train(2000,2002),andGreene(2001,2002)fordetails.Example17.10MaximumSimulatedLikelihoodEstimationofaBinaryChoiceModelWecontinueExample16.5whereestimatesofabinarychoicemodelforproductinnovationareobtained.ThemodelisforProb[yit=1|xit,βi]whereyit=1iffirmirealizedaproductinnovationinyeartand0ifnot.\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation515Theindependentvariablesinthemodelarexit1=constant,xit2=logofsales,xit3=relativesize=ratioofemploymentinbusinessunittoemploymentintheindustry,xit4=ratioofindustryimportsto(industrysales+imports),xit5=ratioofindustryforeigndirectinvestmentto(industrysales+imports),xit6=productivity=ratioofindustryvalueaddedtoindustryemployment,xit7=dummyvariableindicatingthefirmisintherawmaterialssector,xit8=dummyvariableindicatingthefirmisintheinvestmentgoodssector.Thesampleconsistsof1,270Germanmanufacturingfirmsobservedforfiveyears,1984–1988.Thedensitythatentersthelog-likelihoodisf(yit|xit,βi)=Prob[yit|xitβi]=[(2yit−1)xitβi],yit=0,1.whereβi=β+vi,vi∼N[0,].TobeconsistentwithBertschekandLechner(1998)wedidnotfitanyfirm-specific,time-invariantcomponentsinthemainequationforβi.Table17.5presentstheestimatedcoefficientsforthebasicprobitmodelinthefirstcolumn.Theestimatesofthemeans,βareshowninthesecondcolumn.Thereappeartobelargedifferencesintheparameterestimates,thoughthiscanbemisleadingsincethereislargevari-ationacrossthefirmsintheposteriorestimates.ThethirdcolumnpresentsthesquarerootsoftheimplieddiagonalelementsofcomputedasthediagonalelementsofLL.Theseesti-matedstandarddeviationsarefortheunderlyingdistributionoftheparameterinthemodel—theyarenotestimatesofthestandarddeviationofthesamplingdistributionoftheestimator.Forthemeanparameter,thatisshowninparenthesesinthesecondcolumn.Thefourthcol-umnpresentsthesamplemeansandstandarddeviationsofthe1,270estimatedposteriorTABLE17.5EstimatedRandomParametersModelProbitRPMeansRPStd.Devs.EmpiricalDistn.PosteriorConstant−1.96−3.912.70−3.27−3.38(0.23)(0.20)(0.57)(2.14)lnSales0.180.360.280.320.34(0.022)(0.019)(0.15)(0.09)Rel.Size1.076.015.993.332.58(0.14)(0.22)(2.25)(1.30)Import1.131.510.842.011.81(0.15)(0.13)(0.58)(0.74)FDI2.853.816.513.763.63(0.40)(0.33)(1.69)(1.98)Prod.−2.34−5.1013.03−8.15−5.48(0.72)(0.73)(8.29)(1.78)RawMtls−0.28−0.311.65−0.18−0.08(0.081)(0.075)(0.57)(0.37)Invest.0.190.271.420.270.29(0.039)(0.032)(0.38)(0.13)lnL−4114.05−3498.654\nGreene-50240bookJune26,200215:8516CHAPTER17✦MaximumLikelihoodEstimationestimatesofthecoefficients.Thelastcolumnrepeatstheestimatesforthelatentclassmodel.Theagreementinthetwosetsofestimatesisstrikinginviewofthecrudeapproximationgivenbythelatentclassmodel.Figures17.4aandbpresentkerneldensityestimatorsofthefirm-specificprobabilitiescomputedatthe5-yearmeansfortherandomparametersmodelandwiththeoriginalprobitestimates.Theestimatedprobabilitiesarestrikinglysimilartothelatentclassmodel,andalsofairlysimilarto,thoughsmootherthantheprobitestimates.FIGURE17.4aProbitProbabilities.KernelDensityEstimateforPPR3.302.641.98Density1.320.660.00.0.2.4.6.81.01.2PPRFIGURE17.4bRandomParametersProbabilities.KernelDensityEstimateforPRI1.601.280.96Density0.640.320.00.2.0.2.4.6.81.01.2PRI\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation517Figure17.5showsthekerneldensityestimateforthefirm-specificestimatesofthelogsalescoefficient.ThecomparisontoFigure16.5showssomestrikingdifference.Therandomparametersmodelproducesestimatesthataresimilarinmagnitude,butthedistributionsareactuallyquitedifferent.Whichshouldbepreferred?Onlyonthebasisthatthethreepointdiscretelatentclassmodelisanapproximationtothecontinuousvariationmodel,wewouldpreferthelatter.FIGURE17.5aRandomParameters,βsales.KernelDensityEstimateforBS6.405.123.84Density2.561.280.00.2.1.0.1.2.3.4.5.6.7BSFIGURE17.5bLatentClassModel,βsales.KernelDensityEstimateforBSALES7.205.764.32Density2.881.440.00.2.3.4.5.6BSALES\nGreene-50240bookJune26,200215:8518CHAPTER17✦MaximumLikelihoodEstimation17.9PSEUDO-MAXIMUMLIKELIHOODESTIMATIONANDROBUSTASYMPTOTICCOVARIANCEMATRICESMaximumlikelihoodestimationrequirescompletespecificationofthedistributionoftheobservedrandomvariable.Ifthecorrectdistributionissomethingotherthanwhatweassume,thenthelikelihoodfunctionismisspecifiedandthedesirablepropertiesoftheMLEmightnothold.Thissectionconsidersasetofresultsonanestimationapproachthatisrobusttosomekindsofmodelmisspecification.Forexample,wehavefoundthatinamodel,iftheconditionalmeanfunctionisE[y|x]=xβ,thencertainestimators,suchasleastsquares,are“robust”tospecifyingthewrongdistributionofthedisturbances.Thatis,LSisMLEifthedisturbancesarenormallydistributed,butwecanstillclaimsomedesirablepropertiesforLS,includingconsistency,evenifthedisturbancesarenotnormallydistributed.Thissectionwilldiscusssomeresultsthatrelatetowhathappensifwemaximizethe“wrong”log-likelihoodfunction,andforthosecasesinwhichtheestimatorisconsistentdespitethis,howtocomputeanappropriateasymptoticcovariancematrixforit.25Letf(yi|xi,β)bethetrueprobabilitydensityforarandomvariableyigivenasetofcovariatesxiandparametervectorβ.Thelog-likelihoodfunctionis(1/n)logL(β|y,nX)=(1/n)i=1logf(yi|xi,β).TheMLE,βˆML,isthesamplestatisticthatmaximizesthisfunction.(ThedivisionoflogLbyndoesnotaffectthesolution.)Wemaximizethelog-likelihoodfunctionbyequatingitsderivativestozero,sotheMLEisobtainedbysolvingthesetofempiricalmomentequations1n∂logf(y|x,βˆ)1niiML=di(βˆML)=d¯(βˆML)=0.n∂βˆMLni=1i=1Thepopulationcounterparttothesamplemomentequationisn1∂logL1E=Edi(β)=E[d¯(β)]=0.n∂βni=1UsingwhatweknowaboutGMMestimators,ifE[d¯(β)]=0,thenβˆMLisconsistentandasymptoticallynormallydistributed,withasymptoticcovariancematrixequaltoV=[G(β)G(β)]−1G(β)Var[d¯(β)]G(β)[G(β)G(β)]−1,MLwhereG(β)=plim∂d¯(β)/∂β.Sinced¯(β)isthederivativevector,G(β)is1/ntimestheexpectedHessianoflogL;thatis,(1/n)E[H(β)]=H¯(β).Aswesawearlier,Var[∂logL/∂β]=−E[H(β)].Collectingallsevenappearancesof(1/n)E[H(β)],we−1obtainthefamiliarresultVML=−E[H(β)].[AllthenscancelandVar[d¯]=(1/n)H¯(β).]NotethatthisresultdependscruciallyontheresultVar[∂logL/∂β]=−E[H(β)].25Thefollowingwillsketchasetofresultsrelatedtothisestimationproblem.TheimportantreferencesonthissubjectareWhite(1982a);Gourieroux,Monfort,andTrognon(1984);Huber(1967);andAmemiya(1985).ArecentworkwithalargeamountofdiscussiononthesubjectisMittelhammeretal.(2000).Thederivationsintheseworksarecomplex,andwewillonlyattempttoprovideanintuitiveintroductiontothetopic.\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation519Themaximumlikelihoodestimatorisobtainedbymaximizingthefunctionh¯n(y,nX,β)=(1/n)i=1logf(yi,xi,β).Thisfunctionconvergestoitsexpectationasn→∞.Sincethisfunctionisthelog-likelihoodforthesample,itisalsothecase(notprovenhere)thatasn→∞,itattainsitsuniquemaximumatthetrueparametervector,β.(Weusedthisresultinprovingtheconsistencyofthemaximumlikelihoodestimator.)Sinceplimh¯n(y,X,β)=E[h¯n(y,X,β)],itfollows(byinterchangingdifferentiationandtheexpectationoperation)thatplim∂h¯n(y,X,β)/∂β=E[∂h¯n(y,X,β)/∂β].But,ifthisfunctionachievesitsmaximumatβ,thenitmustbethecasethatplim∂h¯n(y,X,β)/∂β=0.AnestimatorthatisobtainedbymaximizingacriterionfunctioniscalledanMestimator[Huber(1967)]oranextremumestimator[Amemiya(1985)].Supposethatweobtainanestimatorbymaximizingsomeotherfunction,Mn(y,X,β)that,althoughnotthelog-likelihoodfunction,alsoattainsitsuniquemaximumatthetrueβasn→∞.Thentheprecedingargumentmightproduceaconsistentestimatorwithaknownasymp-toticdistribution.Forexample,thelog-likelihoodforalinearregressionmodelwithnormallydistributeddisturbanceswithdifferentvariances,σ2ω,isin2h¯1−12(yi−xiβ)n(y,X,β)=log(2πσωi)+.n2σ2ωii=1Bymaximizingthisfunction,weobtainthemaximumlikelihoodestimator.Butwealsoexaminedanotherestimator,simpleleastsquares,whichmaximizesMn(y,X,β)=n2−(1/n)i=1(yi−xiβ).Asweshowedearlier,leastsquaresisconsistentandasymp-toticallynormallydistributedevenwiththisextension,soitqualifiesasanMestimatorofthesortweareconsideringhere.Nowconsiderthegeneralcase.Supposethatweestimateβbymaximizingacriterionfunction1nMn(y|X,β)=logg(yi|xi,β).ni=1SupposeaswellthatplimMn(y,X,β)=E[Mn(y,X,β)]andthatasn→∞,E[Mn(y,X,β)]attainsitsuniquemaximumatβ.Then,bytheargumentweusedabovefortheMLE,plim∂Mn(y,X,β)/∂β=E[∂Mn(y,X,β)/∂β]=0.Onceagain,wehaveasetofmomentequationsforestimation.LetβˆEbetheestimatorthatmaximizesMn(y,X,β).Thentheestimatorisdefinedby∂M(y,X,βˆ)1n∂logg(y|x,βˆ)nEiiE==m¯(βˆE)=0.∂βˆEn∂βˆEi=1Thus,βˆEisaGMMestimator.Usingthenotationofourearlierdiscussion,G(βˆE)isthesymmetricHessianofE[Mn(y,X,β)],whichwewilldenote(1/n)E[HM(βˆE)]=H¯M(βˆE).ProceedingaswedidabovetoobtainVML,wefindthattheappropriateasymptoticcovariancematrixfortheextremumestimatorwouldbe1V=[H¯(β)]−1[H(β)]−1EMMnwhere=Var[∂logg(yi|xi,β)/∂β],and,asbefore,theasymptoticdistributionisnormal.\nGreene-50240bookJune26,200215:8520CHAPTER17✦MaximumLikelihoodEstimationTheHessianinVEcaneasilybeestimatedbyusingitsempiricalcounterpart,1n∂2logg(y|x,βˆ)iiEEst.[H¯M(βˆE)]=.n∂βˆ∂βˆi=1EEBut,remainstobespecified,anditisunlikelythatwewouldknowwhatfunctiontouse.Theimportantdifferenceisthatinthiscase,thevarianceofthefirstderivativesvectorneednotequaltheHessian,soVEdoesnotsimplify.Wecan,however,consistentlyestimatebyusingthesamplevarianceofthefirstderivatives,1n∂logg(y|x,βˆ)∂logg(y|x,βˆ)ˆ=iiii.n∂βˆ∂βˆi=1Ifthiswerethemaximumlikelihoodestimator,thenˆwouldbetheBHHHestimatorthatwehaveusedatseveralpoints.Forexample,fortheleastsquaresestimatorinntheheteroscedasticlinearregressionmodel,thecriterionisMn(y,X,β)=−(1/n)i=1(y−xβ)2,thesolutionisb,G(b)=(−2/n)XX,andii1n4nˆ=[2x(y−xβ)][2x(y−xβ)]=e2xx.iiiiiiiiinni=1i=1Collectingterms,the4scancelandweareleftpreciselywiththeWhiteestimatorof(11-13)!Atthispoint,weconsiderthemotivationforallthisweightytheory.Onedisad-vantageofmaximumlikelihoodestimationisitsrequirementthatthedensityoftheobservedrandomvariable(s)befullyspecified.Theprecedingdiscussionsuggeststhatinsomesituations,wecanmakesomewhatfewerassumptionsaboutthedistributionthanafullspecificationwouldrequire.Theextremumestimatorisrobusttosomekindsofspecificationerrors.Oneusefulresulttoemergefromthisderivationisanestimatorfortheasymptoticcovariancematrixoftheextremumestimatorthatisrobustatleasttosomemisspecification.Inparticular,ifweobtainβˆEbymaximizingacriterionfunctionthatsatisfiestheotherassumptions,thentheappropriateestimatoroftheasymptoticcovariancematrixis1Est.V=[H¯(βˆ)]−1ˆ(βˆ)[H¯(βˆ)]−1.EEEEn−1IfβˆEisthetrueMLE,thenVEsimplifiesto−[H(βˆE)].Inthecurrentliterature,thisestimatorhasbeencalledthe“sandwich”estimator.Thereisatrendinthecurrentliteraturetocomputethisestimatorroutinely,regardlessofthelikelihoodfunction.Itisworthnotingthatifthelog-likelihoodisnotspecifiedcorrectly,thentheparameterestimatorsarelikelytobeinconsistent,saveforthecasessuchasthosenotedbelow,sorobustestimationoftheasymptoticcovariancematrixmaybemisdirectedeffort.Butifthelikelihoodfunctioniscorrect,thenthesandwichestimatorisunnecessary.Thismethodisnotageneralpatchformisspecifiedmodels.Noteverylikelihoodfunctionqualifiesasaconsistentextremumestimatorfortheparametersofinterestinthemodel.Onemightwonderatthispointhowlikelyitisthattheconditionsneededforallthistoworkwillbemet.Thereareapplicationsintheliteratureinwhichthismachin-eryhasbeenusedthatprobablydonotmeettheseconditions,suchasthetobitmodelofChapter22.Wehaveseenoneimportantcase.Leastsquaresinthegeneralized\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation521regressionmodelpassesthetest.Anotherimportantapplicationismodelsof“individ-ualheterogeneity”incross-sectiondata.Evidencesuggeststhatsimplemodelsoftenoverlookunobservedsourcesofvariationacrossindividualsincrosssections,suchasunmeasurable“familyeffects”instudiesofearningsoremployment.Supposethatthecorrectmodelforavariableish(yi|xi,vi,β,θ),whereviisarandomtermthatisnotob-servedandθisaparameterofthedistributionofv.Thecorrectlog-likelihoodfunctionisilogf(yi|xi,β,θ)=ilog∫vh(yi|xi,vi,β,θ)f(vi)dvi.Supposethatwemaximizesomeotherpseudo-log-likelihoodfunction,ilogg(yi|xi,β)andthenusethesandwichestimatortoestimatetheasymptoticcovariancematrixofβˆ.Doesthisproduceacon-sistentestimatorofthetrueparametervector?Surprisingly,sometimesitdoes,eventhoughithasignoredthenuisanceparameter,θ.Wesawonecase,usingOLSintheGRmodelwithheteroscedasticdisturbances.InappropriatelyfittingaPoissonmodelwhenthenegativebinomialmodeliscorrect—seeSection21.9.3—isanothercase.Forsomespecifications,usingthewronglikelihoodfunctionintheprobitmodelwithproportionsdata(Section21.4.6)isathird.[Thesetwoexamplesaresuggested,withseveralothers,byGourieroux,Monfort,andTrognon(1984).]Wedoemphasizeonceagainthatthesandwichestimator,inandofitself,isnotnecessarilyofanyvirtueifthelikelihoodfunctionismisspecifiedandtheotherconditionsfortheMestimatorarenotmet.17.10SUMMARYANDCONCLUSIONSThischapterhaspresentedthetheoryandseveralapplicationsofmaximumlikelihoodestimation,whichisthemostfrequentlyusedestimationtechniqueineconometricsafterleastsquares.Themaximumlikelihoodestimatorsareconsistent,asymptoticallynormallydistributed,andefficientamongestimatorsthathavetheseproperties.Thedrawbacktothetechniqueisthatitrequiresafullyparametric,detailedspecificationofthedatageneratingprocess.Assuch,itisvulnerabletomisspecificationproblems.ThenextchapterconsidersGMMestimationtechniqueswhicharelessparametric,butmorerobusttovariationintheunderlyingdatageneratingprocess.KeyTermsandConcepts•Asymptoticefficiency•Identification•Nonlinearleastsquares•Asymptoticnormality•Informationmatrix•Outerproductofgradients•Asymptoticvariance•Informationmatrixequalityestimator•BHHHestimator•Invariance•Regularityconditions•Box–Coxmodel•Jacobian•Scoretest•Conditionalmoment•Lagrangemultipliertest•Stochasticfrontierrestrictions•Likelihoodequation•Two-stepmaximum•Concentratedlog-likelihood•Likelihoodfunctionlikelihood•Consistency•Likelihoodinequality•Waldstatistic•Cramer´–Raolowerbound•Likelihoodratiotest•Waldtest•Efficientscore•Limitedinformation•Estimableparametersmaximumlikelihood•Fullinformationmaximum•Maximumlikelihoodlikelihoodestimator\nGreene-50240bookJune26,200215:8522CHAPTER17✦MaximumLikelihoodEstimationExercises1.Assumethatthedistributionofxisf(x)=1/θ,0≤x≤θ.Inrandomsamplingfromthisdistribution,provethatthesamplemaximumisaconsistentestimatorofθ.Note:Youcanprovethatthemaximumisthemaximumlikelihoodestimatorofθ.Buttheusualpropertiesdonotapplyhere.Whynot?[Hint:Attempttoverifythattheexpectedfirstderivativeofthelog-likelihoodwithrespecttoθiszero.]2.Inrandomsamplingfromtheexponentialdistributionf(x)=(1/θ)e−x/θ,x≥0,θ>0,findthemaximumlikelihoodestimatorofθandobtaintheasymptoticdistributionofthisestimator.3.Mixturedistribution.Supposethatthejointdistributionofthetworandomvariablesxandyisθe−(β+θ)y(βy)xf(x,y)=,β,θ>0,y≥0,x=0,1,2,....x!a.Findthemaximumlikelihoodestimatorsofβandθandtheirasymptoticjointdistribution.b.Findthemaximumlikelihoodestimatorofθ/(β+θ)anditsasymptoticdistribution.c.Provethatf(x)isoftheformf(x)=γ(1−γ)x,x=0,1,2,...,andfindthemaximumlikelihoodestimatorofγanditsasymptoticdistribution.d.Provethatf(y|x)isoftheformλe−λy(λy)xf(y|x)=,y≥0,λ>0.x!Provethatf(y|x)integratesto1.Findthemaximumlikelihoodestimatorofλanditsasymptoticdistribution.[Hint:Intheconditionaldistribution,justcarrythexsalongasconstants.]e.Provethatf(y)=θe−θy,y≥0,θ>0.Findthemaximumlikelihoodestimatorofθanditsasymptoticvariance.f.Provethate−βy(βy)xf(x|y)=,x=0,1,2,...,β>0.x!Basedonthisdistribution,whatisthemaximumlikelihoodestimatorofβ?4.SupposethatxhastheWeibulldistributionβ−1−αxβf(x)=αβxe,x≥0,α,β>0.a.Obtainthelog-likelihoodfunctionforarandomsampleofnobservations.b.Obtainthelikelihoodequationsformaximumlikelihoodestimationofαandβ.Notethatthefirstprovidesanexplicitsolutionforαintermsofthedataandβ.But,afterinsertingthisinthesecond,weobtainonlyanimplicitsolutionforβ.Howwouldyouobtainthemaximumlikelihoodestimators?\nGreene-50240bookJune26,200215:8CHAPTER17✦MaximumLikelihoodEstimation523c.Obtainthesecondderivativesmatrixofthelog-likelihoodwithrespecttoαandβ.Theexactexpectationsoftheelementsinvolvingβinvolvethederivativesofthegammafunctionandarequitemessyanalytically.Ofcourse,yourexactresultprovidesanempiricalestimator.HowwouldyouestimatetheasymptoticcovariancematrixforyourestimatorsinPartb?d.ProvethatαβCov[lnx,xβ]=1.[Hint:Theexpectedfirstderivativesofthelog-likelihoodfunctionarezero.]5.ThefollowingdataweregeneratedbytheWeibulldistributionofExercise4:1.30430.492541.27421.40190.325560.299650.264231.08781.94610.476153.64540.153441.23570.963810.334531.12272.02961.27970.960802.0070a.Obtainthemaximumlikelihoodestimatesofαandβ,andestimatetheasymp-toticcovariancematrixfortheestimates.b.CarryoutaWaldtestofthehypothesisthatβ=1.c.Obtainthemaximumlikelihoodestimateofαunderthehypothesisthatβ=1.d.UsingtheresultsofPartsaandc,carryoutalikelihoodratiotestofthehypothesisthatβ=1.e.CarryoutaLagrangemultipliertestofthehypothesisthatβ=1.6.(LimitedInformationMaximumLikelihoodEstimation).Considerabivariatedistributionforxandythatisafunctionoftwoparameters,αandβ.Thejointdensityisf(x,y|α,β).Weconsidermaximumlikelihoodestimationofthetwoparameters.Thefullinformationmaximumlikelihoodestimatoristhenowfamil-iarmaximumlikelihoodestimatorofthetwoparameters.Now,supposethatwecanfactorthejointdistributionasdoneinExercise3,butinthiscase,wehavef(x,y|α,β)=f(y|x,α,β)f(x|α).Thatis,theconditionaldensityforyisafunc-tionofbothparameters,butthemarginaldistributionforxinvolvesonlyα.a.Writedownthegeneralformfortheloglikelihoodfunctionusingthejointdensity.b.Sincethejointdensityequalstheproductoftheconditionaltimesthemarginal,thelog-likelihoodfunctioncanbewrittenequivalentlyintermsofthefactoreddensity.Writethisdown,ingeneralterms.c.Theparameterαcanbeestimatedbyitselfusingonlythedataonxandtheloglikelihoodformedusingthemarginaldensityforx.Itcanalsobeestimatedwithβbyusingthefulllog-likelihoodfunctionanddataonbothyandx.Showthis.d.ShowthatthefirstestimatorinPartchasalargerasymptoticvariancethanthesecondone.Thisisthedifferencebetweenalimitedinformationmaximumlikelihoodestimatorandafullinformationmaximumlikelihoodestimator.e.Showthatif∂2lnf(y|x,α,β)/∂α∂β=0,thentheresultinPartdisnolongertrue.7.ShowthatthelikelihoodinequalityinTheorem17.3holdsforthePoissondistribu-tionusedinSection17.3byshowingthatE[(1/n)lnL(θ|y)]isuniquelymaximizedatθ=θ0.Hint:Firstshowthattheexpectationis−θ+θ0lnθ−E0[lnyi!].8.ShowthatthelikelihoodinequalityinTheorem17.3holdsforthenormaldistribution.9.Forrandomsamplingfromtheclassicalregressionmodelin(17-3),reparameterizethelikelihoodfunctionintermsofη=1/σandδ=(1/σ)β.Findthemaximum\nGreene-50240bookJune26,200215:8524CHAPTER17✦MaximumLikelihoodEstimationlikelihoodestimatorsofηandδandobtaintheasymptoticcovariancematrixoftheestimatorsoftheseparameters.10.Section14.3.1presentsestimatesofaCobb–DouglascostfunctionusingNerlove’s1955dataontheU.S.electricpowerindustry.ChristensenandGreene’s1976updateofthisstudyused1970dataforthisindustry.TheChristensenandGreenedataaregiveninTableF5.2.Thesedatahaveprovidedastandardtestdatasetforestimatingdifferentformsofproductionandcostfunctions,includingthestochasticfrontiermodelexaminedinExample17.5.Ithasbeensuggestedthatoneexplanationfortheapparentfindingofeconomiesofscaleinthesedataisthatthesmallerfirmswereinefficientforotherreasons.Thestochasticfrontiermightallowonetodisentangletheseeffects.Usethesedatatofitafrontiercostfunctionwhichincludesaquadraticterminlogoutputinadditiontothelineartermandthefactorprices.ThenexaminetheestimatedJondrowetal.residualstoseeiftheydoindeedvarynegativelywithoutput,assuggested.(Thiswillrequireeithersomeprogrammingonyourpartorspecializedsoftware.ThestochasticfrontiermodelisprovidedasanoptioninTSPandLIMDEP.Or,thelikelihoodfunctioncanbeprogrammedfairlyeasilyforRATSorGAUSS.Note,foracostfrontierasopposedtoaproductionfrontier,itisnecessarytoreversethesignontheargumentinthefunction.)11.Consider,samplingfromamultivariatenormaldistributionwithmeanvectorµ=(µ,µ,...,µ)andcovariancematrixσ2I.Thelog-likelihoodfunctionis12M−nMnM1nlnL=ln(2π)−lnσ2−(y−µ)(y−µ).2ii222σi=1ShowthatthemaximumlikelihoodestimatesoftheparametersarenM2MnM2i=1m=1(yim−y¯m)11212σˆML==(yim−y¯m)=σˆm.nMMnMm=1i=1m=1Derivethesecondderivativesmatrixandshowthattheasymptoticcovariancematrixforthemaximumlikelihoodestimatorsis−1∂2lnLσ2I/n0−E=4.∂θ∂θ02σ/(nM)SupposethatwewishedtotestthehypothesisthatthemeansoftheMdistributionswereallequaltoaparticularvalueµ0.ShowthattheWaldstatisticwouldbe2−1!"σˆnW=(y¯−µ0i)I(y¯−µ0i),=(y¯−µ0i)(y¯−µ0i),ns2wherey¯isthevectorofsamplemeans.\nGreene-50240bookJune26,200215:618THEGENERALIZEDMETHODOFMOMENTSQ18.1INTRODUCTIONThemaximumlikelihoodestimatorisfullyefficientamongconsistentandasymptoti-callynormallydistributedestimators,inthecontextofthespecifiedparametricmodel.Thepossibleshortcominginthisresultisthattoattainthatefficiency,itisnecessarytomakepossiblystrong,restrictiveassumptionsaboutthedistribution,ordatageneratingprocess.Thegeneralizedmethodofmoments(GMM)estimatorsdiscussedinthischaptermoveawayfromparametricassumptions,towardestimatorswhicharerobusttosomevariationsintheunderlyingdatageneratingprocess.Thischapterwillpresentanumberoffairlygeneralresultsonparameterestimation.Webeginwithperhapstheoldestformalizedtheoryofestimation,theclassicaltheoryofthemethodofmoments.ThisbodyofresultsdatestothepioneeringworkofFisher(1925).Theuseofsamplemomentsasthebuildingblocksofestimatingequationsisfundamentalineconometrics.GMMisanextensionofthistechniquewhich,aswillbeclearshortly,encompassesnearlyallthefamiliarestimatorsdiscussedinthisbook.Section18.2willintroducetheestimationframeworkwiththemethodofmoments.FormalitiesoftheGMMestimatorarepresentedinSection18.3.Section18.4discusseshypothesistestingbasedonmomentequations.Amajorapplications,dynamicpaneldatamodels,isdescribedinSection18.5.Example18.1EulerEquationsandLifeCycleConsumptionOneofthemostoftencitedapplicationsoftheGMMprincipleforestimatingeconomet-ricmodelsisHall’s(1978)permanentincomemodelofconsumption.Theoriginalformofthemodel(withsomesmallchangesinnotation)positsahypothesisabouttheoptimizingbehaviorofaconsumeroverthelifecycle.Consumersarehypothesizedtoactaccordingtothemodel:T−tτT−tτ11MaximizeEtU(ct+τ)|tsubjectto(ct+τ−wt+τ)=At1+δ1+rτ=0τ=0TheinformationavailableattimetisdenotedtsothatEtdenotestheexpectationformedattimetbasedoninformationsett.ThemaximandistheexpecteddiscountedstreamoffutureconsumptionfromtimetuntiltheendoflifeattimeT.Theindividual’ssubjectiverateoftimepreferenceisβ=1/(1+δ).Therealrateofinterest,r≥δisassumedtobeconstant.TheutilityfunctionU(ct)isassumedtobestrictlyconcaveandtimeseparable(asshowninthemodel).Oneperiod’sconsumptionisct.Theintertemporalbudgetconstraintstatesthatthepresentdiscountedexcessofctoverearnings,wt,overthelifetimeequalstotalassetsAtnotincludinghumancapital.Inthismodel,itisclaimedthattheonlysourceofuncertaintyiswt.Noassumptionismadeaboutthestochasticpropertiesofwtexceptthatthereexistsanexpectedfutureearnings,Et[wt+τ|t].Successivevaluesarenotassumedtobeindependentandwtisnotassumedtobestationary.525\nGreene-50240bookJune26,200215:6526CHAPTER18✦TheGeneralizedMethodofMomentsHall’smajor“theorem”inthepaperisthesolutiontotheoptimizationproblem,whichstates1+δEt[U(ct+1)|t]=U(ct)1+rForourpurposes,themajorconclusionofthepaperis“Corollary1”whichstates“Noin-formationavailableintimetapartfromthelevelofconsumption,cthelpspredictfutureconsumption,ct+1,inthesenseofaffectingtheexpectedvalueofmarginalutility.Inpar-ticular,incomeorwealthinperiodstorearlierareirrelevantoncectisknown.”WecanusethisasthebasisofamodelthatcanbeplacedintheGMMframework.Inordertoproceed,itisnecessarytoassumeaformoftheutilityfunction.Acommon(convenient)formoftheutilityfunctionisU(c)=C1−α/(1−α)whichismonotonic,U=C−α>0andcon-tttcave,U/U=−α/C<0.Insertingthisformintothesolution,rearrangingtheterms,andtreparameterizingitforconvenience,wehave−α1ct+1λEt(1+r)−1|t=Etβ(1+r)Rt+1−1|t=0.1+δctHallassumedthatrwasconstantovertime.Otherapplicationsofthismodelingframework[e.g.,HansenandSingleton(1982)]havemodifiedtheframeworksoastoinvolveaforecastedinterestrate,rt+1.Howoneproceedsfromheredependsonwhatisintheinformationset.Theunconditionalmeandoesnotidentifythetwoparameters.Thecorollarystatesthattheonlyrelevantinformationintheinformationsetisct.Giventheformofthemodel,themorenaturalinstrumentmightbeRt.Thisassumptionexactlyidentifiesthetwoparametersinthemodel;10λEtβ(1+rt+1)Rt+1−1=.Rt0Asstated,themodelhasnotestableimplications.Thesetwomomentequationswouldexactlyidentifythetwounknownparameters.Hallhypothesizedseveralmodelsinvolvingincomeandconsumptionwhichwouldoveridentifyandthusplacerestrictionsonthemodel.18.2CONSISTENTESTIMATION:THEMETHODOFMOMENTSSamplestatisticssuchasthemeanandvariancecanbetreatedassimpledescriptivemeasures.InourdiscussionofestimationinAppendixC,however,weargued,thatin,general,samplestatisticseachhaveacounterpartinthepopulation,forexample,thecorrespondencebetweenthesamplemeanandthepopulationexpectedvalue.Thenatural(perhapsobvious)nextstepintheanalysisistousethisanalogytojustifyusingthesample“moments”asestimatorsofthesepopulationparameters.Whatremainstoestablishiswhetherthisapproachisthebest,orevenagoodwaytousethesampledatatoinferthecharacteristicsofthepopulation.Thebasisofthemethodofmomentsisasfollows:Inrandomsampling,undergenerallybenignassumptions,asamplestatisticwillconvergeinprobabilitytosomen2constant.Forexample,withi.i.d.randomsampling,m¯2=(1/n)i=1yiwillconvergeinmeansquaretothevarianceplusthesquareofthemeanofthedistributionofyi.Thisconstantwill,inturn,beafunctionoftheunknownparametersofthedistribution.ToestimateKparameters,θ1,...,θK,wecancomputeKsuchstatistics,m¯1,...,m¯K,whoseprobabilitylimitsareknownfunctionsoftheparameters.TheseKmomentsareequated\nGreene-50240bookJune26,200215:6CHAPTER18✦TheGeneralizedMethodofMoments527totheKfunctions,andthefunctionsareinvertedtoexpresstheparametersasfunctionsofthemoments.Themomentswillbeconsistentbyvirtueofalawoflargenumbers(TheoremsD.4–D.9).TheywillbeasymptoticallynormallydistributedbyvirtueoftheLindberg–LevyCentralLimitTheorem(D.18).ThederivedparameterestimatorswillinheritconsistencybyvirtueoftheSlutskyTheorem(D.12)andasymptoticnormalitybyvirtueofthedeltamethod(TheoremD.21).Thissectionwilldevelopthistechniqueinsomedetail,partlytopresentitinitsownrightandpartlyasapreludetothediscussionofthegeneralizedmethodofmoments,orGMM,estimationtechnique,whichistreatedinSection18.3.18.2.1RANDOMSAMPLINGANDESTIMATINGTHEPARAMETERSOFDISTRIBUTIONSConsiderindependent,identicallydistributedrandomsamplingfromadistributionf(y|θ,...,θ)withfinitemomentsuptoE[y2K].Thesampleconsistsofnobser-1Kvations,y1,...,yn.Thekth“raw”oruncenteredmomentis1nm¯=yk.kini=1ByTheoremD.1,E[m¯]=µ=Eykkkiand11Var[m¯]=Varyk=µ−µ2.ki2kknnByconvention,µ=E[y]=µ.BytheKhinchineTheorem,D.5,1iplimm¯=µ=Eyk.kkiFinally,bytheLindberg–LevyCentralLimitTheorem,√dn(m¯−µ)−→N0,µ−µ2.kk2kkIngeneral,µwillbeafunctionoftheunderlyingparameters.BycomputingKkrawmomentsandequatingthemtothesefunctions,weobtainKequationsthatcan(inprinciple)besolvedtoprovideestimatesoftheKunknownparameters.Example18.2MethodofMomentsEstimatorforN[µ,σ2]InrandomsamplingfromN[µ,σ2],n1plimyi=plimm¯1=E[yi]=µni=1andn12222plimyi=plimm¯2=Var[yi]+µ=σ+µ.ni=1Equatingtheright-andleft-handsidesoftheprobabilitylimitsgivesmomentestimatorsµˆ=m¯=y¯1\nGreene-50240bookJune26,200215:6528CHAPTER18✦TheGeneralizedMethodofMomentsand21n1n1n2222σˆ=m¯2−m¯1=yi−yi=(yi−y¯).nnni=1i=1i=1Notethatσˆ2isbiased,althoughbothestimatorsareconsistent.Althoughthemomentsbasedonpowersofyprovideanaturalsourceofinformationabouttheparameters,otherfunctionsofthedatamayalsobeuseful.Letmk(·)beacontinuousanddifferentiablefunctionnotinvolvingthesamplesizen,andlet1nm¯k=mk(yi),k=1,2,...,K.ni=1Thesearealso“moments”ofthedata.ItfollowsfromTheoremD.4andthecorollary,(D-5),thatplimm¯k=E[mk(yi)]=µk(θ1,...,θK).Weassumethatµk(·)involvessomeoforalltheparametersofthedistribution.WithKparameterstobeestimated,theKmomentequations,m¯1−µ1(θ1,...,θK)=0,m¯2−µ2(θ1,...,θK)=0,···m¯K−µK(θ1,...,θK)=0,provideKequationsinKunknowns,θ1,...,θK.Iftheequationsarecontinuousandfunctionallyindependent,thenmethodofmomentsestimatorscanbeobtainedbysolv-ingthesystemofequationsforθˆk=θˆk[m¯1,...,m¯K].Assuggested,theremaybemorethanonesetofmomentsthatonecanuseforestimatingtheparameters,ortheremaybemoremomentequationsavailablethanarenecessary.Example18.3InverseGaussian(Wald)DistributionTheinverseGaussiandistributionisusedtomodelsurvivaltimes,orelapsedtimesfromsomebeginningtimeuntilsomekindoftransitiontakesplace.Thestandardformofthedensityforthisrandomvariableisλλ(y−µ)2f(y)=exp−,y>0,λ>0,µ>0.2πy32µ2yThemeanisµwhilethevarianceisµ3/λ.Theefficientmaximumlikelihoodestimatorsofnnthetwoparametersarebasedon(1/n)i=1yiand(1/n)i=1(1/yi).Sincethemeanandvariancearesimplefunctionsoftheunderlyingparameters,wecanalsousethesamplemeanandsamplevarianceasmomentestimatorsofthesefunctions.Thus,analternativepairofmethodofmomentsestimatorsfortheparametersoftheWalddistributioncanbebasedonnn2(1/n)i=1yiand(1/n)i=1yi.Thepreciseformulasforthesetwopairsofestimatorsisleftasanexercise.\nGreene-50240bookJune26,200215:6CHAPTER18✦TheGeneralizedMethodofMoments529Example18.4MixturesofNormalDistributionsQuandtandRamsey(1978)analyzedtheproblemofestimatingtheparametersofamixtureofnormaldistributions.Supposethateachobservationinarandomsampleisdrawnfromoneoftwodifferentnormaldistributions.Theprobabilitythattheobservationisdrawnfromthefirstdistribution,N[µ,σ2],isλ,andtheprobabilitythatitisdrawnfromthesecondis11(1−λ).Thedensityfortheobservedyis22f(y)=λNµ1,σ1+(1−λ)Nµ2,σ2,0≤λ≤1=λe−1/2[(y−µ1)/σ1]2+1−λe−1/2[(y−µ2)/σ2]2.1/21/22πσ22πσ212Thesamplemeanandsecondthroughfifthcentralmoments,n1km¯k=(yi−y¯),k=2,3,4,5,ni=1providefiveequationsinfiveunknownsthatcanbesolved(viaaninth-orderpolynomial)forconsistentestimatorsofthefiveparameters.Becausey¯convergesinprobabilitytoE[yi]=µ,thetheoremsgivenearlierform¯asanestimatorofµapplyaswelltom¯asanestimatorofkkkkµk=E[(yi−µ)].Forthemixednormaldistribution,themeanandvarianceareµ=E[yi]=λµ1+(1−λ)µ2and2222σ=Var[yi]=λσ1+(1−λ)σ2+2λ(1−λ)(µ1−µ2)whichsuggestshowcomplicatedthefamiliarmethodofmomentsislikelytobecome.Analternativemethodofestimationproposedbytheauthorsisbasedontyitµ1+t2σ2/2tµ2+t2σ2/2E[e]=λe1+(1−λ)e2=t,wheretisanyvaluenotnecessarilyaninteger.QuandtandRamsey(1978)suggestchoosingfivevaluesoftthatarenottooclosetogetherandusingthestatisticsn1M¯=etyitni=1toestimatetheparameters.ThemomentequationsareM¯−(µ,µ,σ2,σ2,λ)=0.Theytt1212labelthisprocedurethemethodofmoment-generatingfunctions.(SeeSectionB.6.fordefinitionofthemomentgeneratingfunction.)Inmostcases,methodofmomentsestimatorsarenotefficient.Theexceptionisinrandomsamplingfromexponentialfamiliesofdistributions.\nGreene-50240bookJune26,200215:6530CHAPTER18✦TheGeneralizedMethodofMomentsDEFINITION18.1ExponentialFamilyAnexponential(parametric)familyofdistributionsisonewhoselog-likelihoodisoftheformKlnL(θ|data)=a(data)+b(θ)+ck(data)sk(θ),k=1wherea(·),b(·),c(·),ands(·)arefunctions.Themembersofthe“family”aredistinguishedbythedifferentparametervalues.Ifthelog-likelihoodfunctionisofthisform,thenthefunctionsck(·)arecalledsufficientstatistics.1Whensufficientstatisticsexist,methodofmomentsestimator(s)canbefunctionsofthem.Inthiscase,themethodofmomentsestimatorswillalsobethemaximumlikelihoodestimators,so,ofcourse,theywillbeefficient,atleastasymptotically.Weemphasize,inthiscase,theprobabilitydistributionisfullyspecified.Sincethenormaldistributionisanexponentialfamilywithsufficientstatisticsm¯andm¯,12theestimatorsdescribedinExample18.2arefullyefficient.(Theyarethemaximumlikelihoodestimators.)Themixednormaldistributionisnotanexponentialfamily.WeleaveitasanexercisetoshowthattheWalddistributioninExample18.3isanexponentialfamily.YoushouldbeabletoshowthatthesufficientstatisticsaretheonesthataresuggestedinExample18.3asthebasesfortheMLEsofµandλ.Example18.5GammaDistributionThegammadistribution(seeSectionC.4.5)isλp−λyP−1f(y)=ey,y>0,P>0,λ>0.(P)Thelog-likelihoodfunctionforthisdistributionisnn111lnL=[Plnλ−ln(P)]−λyi+(P−1)lnyi.nnni=1i=1Thisfunctionisanexponentialfamilywitha(data)=0,b(θ)=n[Plnλ−ln(P)]andtwosuf-ficientstatistics,1nyand1nlny.Themethodofmomentsestimatorsbasedonni=1ini=1i1nyand1nlnywouldbethemaximumlikelihoodestimators.But,wealsohaveni=1ini=1iyiP/λn1y2P(P+1)/λ2plimi=.nlnyi(P)−lnλi=11/yiλ/(P−1)(Thefunctions(P)and(P)=dln(P)/dParediscussedinSectionE.5.3.)AnytwoofthesecanbeusedtoestimateλandP.1StuartandOrd(1989,pp.1–29)giveadiscussionofsufficientstatisticsandexponentialfamiliesofdistribu-tions.AresultthatwewilluseinChapter21isthatifthestatistics,ck(data)aresufficientstatistics,thentheconditionaldensityf[y1,...,yn|ck(data),k=1,...,K]isnotafunctionoftheparameters.\nGreene-50240bookJune26,200215:6CHAPTER18✦TheGeneralizedMethodofMoments531FortheincomedatainExampleC.1,thefourmomentslistedabovearen112(m¯1,m¯2,m¯∗,m¯−1)=yi,yi,lnyi,=[31.278,1453.96,3.22139,0.050014].nyii=1Themethodofmomentsestimatorsofθ=(P,λ)basedonthesixpossiblepairsofthesemomentsareasfollows:m¯m¯m¯12−1m¯22.05682,0.065759(Pˆ,λˆ)=.m¯2.77198,0.08862392.60905,0.0800475−1m¯2.4106,0.07707022.26450,0.0713043.03580,0.1018202∗Themaximumlikelihoodestimatesareθˆ(m¯,m¯)=(2.4106,0.0770702).1∗18.2.2ASYMPTOTICPROPERTIESOFTHEMETHODOFMOMENTSESTIMATORInafewcases,wecanobtaintheexactdistributionofthemethodofmomentsestima-tor.Forexample,insamplingfromthenormaldistribution,µˆhasmeanµandvari-anceσ2/nandisnormallydistributedwhileσˆ2hasmean[(n−1)/n]σ2,andvariance[(n−1)/n]22σ4/(n−1)andisexactlydistributedasamultipleofachi-squaredvari-atewith(n−1)degreesoffreedom.Ifsamplingisnotfromthenormaldistribution,theexactvarianceofthesamplemeanwillstillbeVar[y]/n,whereasanasymptoticvarianceforthemomentestimatorofthepopulationvariancecouldbebasedontheleadingtermin(D-27),inExampleD.10,buttheprecisedistributionmaybeintractable.Therearecasesinwhichnoexplicitexpressionisavailableforthevarianceoftheunderlyingsamplemoment.Forinstance,inExample18.4,theunderlyingsamplestatisticis1n1nM¯=etyi=M.titnni=1i=1TheexactvarianceofM¯tisknownonlyiftisaninteger.Butifsamplingisrandom,sinceM¯tisasamplemean:wecanestimateitsvariancewith1/ntimesthesamplevarianceoftheobservationsonMti.WecanalsoconstructanestimatorofthecovarianceofM¯tandM¯s11nEst.Asy.Cov[M¯,M¯]=[(etyi−M¯)(esyi−M¯)].tstsnni=1Ingeneral,whenthemomentsarecomputedas1nm¯k=mk(yi),k=1,...,K,ni=1whereyiisanobservationonavectorofvariables,anappropriateestimatoroftheasymptoticcovariancematrixof[m¯1,...,m¯k]canbecomputedusing111nFjk=[(mj(yi)−m¯j)(mk(yi)−m¯k)],j,k=1,...,K.nnni=1\nGreene-50240bookJune26,200215:6532CHAPTER18✦TheGeneralizedMethodofMoments(Onemightdividetheinnersumbyn−1ratherthann.Asymptoticallyitisthesame.)Thisestimatorprovidestheasymptoticcovariancematrixforthemomentsusedincom-putingtheestimatedparameters.Underourassumptionofiidrandomsamplingfromadistributionwithfinitemomentsupto2K,Fwillconvergeinprobabilitytotheappro-√priatecovariancematrixofthenormalizedvectorofmoments,=Asy.Var[nm¯n(θ)].Finally,underourassumptionsofrandomsampling,thoughtheprecisedistributionislikelytobeunknown,wecanappealtotheLindberg–Levycentrallimittheorem(D.18)toobtainanasymptoticapproximation.Toformalizetheremainderofthisderivation,referbacktothemomentequations,whichwewillnowwritem¯n,k(θ1,θ2,...,θK)=0,k=1,...,K.Thesubscriptnindicatesthedependenceonadatasetofnobservations.Wehavealsocombinedthesamplestatistic(sum)andfunctionofparameters,µ(θ1,...,θK)inthisgeneralformofthemomentequation.LetG¯n(θ)betheK×KmatrixwhosekthrowisthevectorofpartialderivativesG¯=∂m¯n,k.n,k∂θNow,expandthesetofsolvedmomentequationsaroundthetruevaluesoftheparam-etersθ0inalinearTaylorseries.Thelinearapproximationis0≈[m¯n(θ0)]+G¯n(θ0)(θˆ−θ0).Therefore,√√n(θˆ−θ)≈−[G¯(θ)]−1n[m¯(θ)].(18-1)0n0n0(WehavetreatedthisasanapproximationbecausewearenotdealingformallywiththehigherordertermintheTaylorseries.WewillmakethisexplicitinthetreatmentoftheGMMestimatorbelow.)Theargumentneededtocharacterizethelargesamplebehavioroftheestimator,θˆ,arediscussedinAppendixD.WehavefromTheoremD.18√(theCentralLimitTheorem)thatnm¯n(θ0)hasalimitingnormaldistributionwithmeanvector0andcovariancematrixequalto.Assumingthatthefunctionsinthemomentequationarecontinuousandfunctionallyindependent,wecanexpectG¯n(θ0)toconvergetoanonsingularmatrixofconstants,(θ0).Undergeneralconditions,thelimitingdistributionoftherighthandsideof(18-1)willbethatofalinearfunctionofanormallydistributedvector.Jumpingtotheconclusion,weexpecttheasymptoticdistributionofθˆtobenormalwithmeanvectorθ0andcovariancematrix(1/n)×−[(θ)]−1−[(θ)]−1.Thus,theasymptoticcovariancematrixforthemethod00ofmomentsestimatormaybeestimatedwithEst.Asy.Var[θˆ]=1[G¯(θˆ)F−1G¯(θˆ)]−1.nnnExample18.5(Continued)Usingtheestimatesθˆ(m,m)=(2.4106,0.0770702),1∗2Gˆ¯=−1/λˆPˆ/λˆ=−12.97515405.8353.−0.5124112.97515−ˆ1/λˆ\nGreene-50240bookJune26,200215:6CHAPTER18✦TheGeneralizedMethodofMoments533[Thefunctionisd2ln(P)/dP2=(−2)/2.WithPˆ=2.4106,ˆ=1.250832,ˆ=0.658347,andˆ=0.512408]2.ThematrixFisthesamplecovariancematrixofyandlny(using1/19asthedivisor),25.0340.7155F=.0.71550.023873Theproductis1−10.389780.014605GˆF−1G=.n0.0146050.00068747Forthemaximumlikelihoodestimator,theestimateoftheasymptoticcovariancematrixbasedontheexpected(andactual)Hessianis−111−1/λ0.512030.01637−1[−H]=2=.nn−1/λP/λ0.016370.00064654TheHessianhasthesameelementsasGbecausewechosetousethesufficientstatisticsforthemomentestimators,sothemomentequationsthatwedifferentiatedare,apartfromasignchange,alsothederivativesofthelog-likelihood.Theestimatesofthetwovariancesare0.51203and0.00064654,respectively,whichagreesreasonablywellwiththeestimatesabove.ThedifferencewouldbeduetosamplingvariabilityinafinitesampleandthepresenceofFinthefirstvarianceestimator.18.2.3SUMMARY—THEMETHODOFMOMENTSInthesimplestcases,themethodofmomentsisrobusttodifferencesinthespecificationofthedatageneratingprocess.Asamplemeanorvarianceestimatesitspopulationcounterpart(assumingitexists),regardlessoftheunderlyingprocess.Itisthisfreedomfromunnecessarydistributionalassumptionsthathasmadethismethodsopopularinrecentyears.However,thiscomesatacost.IfmoreisknownabouttheDGP,itsspecificdistributionforexample,thenthemethodofmomentsmaynotmakeuseofalloftheavailableinformation.Thus,inexample18.3,thenaturalestimatorsoftheparametersofthedistributionbasedonthesamplemeanandvarianceturnouttobeinefficient.Themethodofmaximumlikelihood,whichremainsthefoundationofmuchworkineconometrics,isanalternativeapproachwhichutilizesthisoutofsampleinformationandis,therefore,moreefficient.18.3THEGENERALIZEDMETHODOFMOMENTS(GMM)ESTIMATORAlargeproportionoftherecentempiricalworkineconometrics,particularlyinmacroe-conomicsandfinance,hasemployedGMMestimators.Asweshallsee,thisbroadclassofestimators,infact,includesmostoftheestimatorsdiscussedelsewhereinthisbook.Beforecontinuing,itwillbeusefulforyoutoread(orreread)thefollowingsections:1.ConsistentEstimation:TheMethodofMoments:Section18.2,2.CorrelationBetweenxiandεi:InstrumentalVariablesEstimation,Section5.4,2isthedigammafunction.Valuesfor(P),(P),and(P)aretabulatedinAbramovitzandStegun(1971).ThevaluesgivenwereobtainedusingtheIMSLcomputerprogramlibrary.\nGreene-50240bookJune26,200215:6534CHAPTER18✦TheGeneralizedMethodofMoments3.GMMEstimationintheGeneralizedRegressionModel:Sections10.4,11.3,and12.6,4.NonlinearRegressionModels,Chapter9,5.Optimization,SectionE.5,6.RobustEstimationofAsymptoticCovarianceMatrices,Section10.3,7.TheWaldTest,Theorem6.1,8.GMMEstimationofDynamicPanelDataModels,Section13.6.TheGMMestimationtechniqueisanextensionofthemethodofmomentstechniquedescribedinSection18.2.3Inthefollowing,wewillextendthegeneralizedmethodofmomentstoothermodelsbeyondthegeneralizedlinearregression,andwewillfillinsomegapsinthederivationinSection18.2.18.3.1ESTIMATIONBASEDONORTHOGONALITYCONDITIONSEstimationbythemethodofmomentsproceedsasfollows.Themodelspecifiedfortherandomvariableyiimpliescertainexpectations,forexampleE[yi]=µ,whereµisthemeanofthedistributionofyi.Estimationofµthenproceedsbyformingasampleanalogtothepopulationexpectation:E[yi−µ]=0.Thesamplecounterparttothisexpectationistheempiricalmomentequation,1n(yi−µ)ˆ=0.ni=1Theestimatoristhevalueofµˆthatsatisfiesthesamplemomentequation.Theexamplegivenis,ofcourse,atrivialone.Example18.5describesamoreelaboratecaseofsam-plingfromagammadistribution.Themomentconditionsusedforestimationinthatexample(takentwoatatimefromasetoffour)includeE[yi−P/λ]=0andE[lnyi−(P)+lnλ]=0.(Thesetwocoincidewiththetermsinthelikelihoodequationsforthismodel.)Insertingthesampledataintothesampleanalogsproducesthemomentequationsforestimation:1n[yi−Pˆ/λˆ]=0ni=13FormalpresentationoftheresultsrequiredforthisanalysisaregivenbyHansen(1982);HansenandSingleton(1988);Chamberlain(1987);Cumby,Huizinga,andObstfeld(1983);Newey(1984,1985a,1985b);DavidsonandMacKinnon(1993);andMcFaddenandNewey(1994).UsefulsummariesofGMMestimationandotherdevelopmentsineconometricsisPaganandWickens(1989)andMatyas(1999).AnapplicationofsomeofthesetechniquesthatcontainsusefulsummariesisPaganandVella(1989).SomefurtherdiscussioncanbefoundinDavidsonandMacKinnon(1993).Ruud(2000)providesmanyofthetheoreticaldetails.Hayashi(2000)isanotherextensivetreatmentofestimationcenteredonGMMestimators.\nGreene-50240bookJune26,200215:6CHAPTER18✦TheGeneralizedMethodofMoments535and1n[lnyi−(Pˆ)+lnλˆ]=0.ni=1Example18.6OrthogonalityConditionsAssumingthathouseholdsareforecastinginterestratesaswellasearnings,Hall’sconsump-tionmodelwiththecorollaryimpliesthefollowingorthogonalityconditions:10λEtβ(1+rt+1)Rt+1−1×=.Rt0Now,considertheapparentlydifferentcaseoftheleastsquaresestimatoroftheparametersintheclassicallinearregressionmodel.AnimportantassumptionofthemodelisE[xε]=E[x(y−xβ)]=0.iiiiiThesampleanalogis1n1nxεˆ=x(y−xβˆ)=0.iiiiinni=1i=1Theestimatorofβistheonethatsatisfiesthesemomentequations,whicharejustthenormalequationsfortheleastsquaresestimator.So,weseethattheOLSestimatorisamethodofmomentsestimator.FortheinstrumentalvariablesestimatorofSection5.4,wereliedonalargesampleanalogtothemomentcondition,nn11plimzε=plimz(y−xβ)=0.iiiiinni=1i=1Weresolvedtheproblemofhavingmoreinstrumentsthanparametersbysolvingtheequations−1n11111XZZZZεˆ=Xˆe=xˆεˆ=0iinnnnni=1wherethecolumnsofXˆarethefittedvaluesinregressionsonallthecolumnsofZ(thatis,theprojectionsofthesecolumnsofXintothecolumnspaceofZ).(SeeSection5.4forfurtherdetails.)Thenonlinearleastsquaresestimatorwasdefinedsimilarly,thoughinthiscase,thenormalequationsaremorecomplicatedsincetheestimatorisonlyimplicit.ThepopulationorthogonalityconditionforthenonlinearregressionmodelisE[x0ε]=0.iiTheempiricalmomentequationisn1∂E[yi|xi,β](yi−E[yi|xi,β])=0.n∂βi=1Allthemaximumlikelihoodestimatorsthatwehavelookedatthusfarandwillencounterlaterareobtainedbyequatingthederivativesofalog-likelihoodtozero.The\nGreene-50240bookJune26,200215:6536CHAPTER18✦TheGeneralizedMethodofMomentsscaledlog-likelihoodfunctionis11nlnL=lnf(yi|θ,xi),nni=1wheref(·)isthedensityfunctionandθistheparametervector.Fordensitiesthatsatisfytheregularityconditions[seeSection17.4.1],∂lnf(yi|θ,xi)E=0.∂θThemaximumlikelihoodestimatorisobtainedbyequatingthesampleanalogtozero:1∂lnL1n∂lnf(y|x,θˆ)ii==0.n∂θˆn∂θˆi=1(Dividingbyntomakethisresultcomparablewithourearlieronesdoesnotchangethesolution.)Theupshotisthatnearlyalltheestimatorswehavediscussedandwillencounterlatercanbeconstruedasmethodofmomentsestimators.[Manski’s(1992)treatmentofanalogestimationprovidessomeinterestingextensionsandmethodolog-icaldiscourse.]Asweextendthislineofreasoning,itwillemergethatnearlyalltheestimatorsdefinedinthisbookcanbeviewedasmethodofmomentsestimators.18.3.2GENERALIZINGTHEMETHODOFMOMENTSTheprecedingexamplesallhaveacommonaspect.Ineachcaselistedsaveforthegeneralcaseoftheinstrumentalvariableestimator,thereareexactlyasmanymomentequationsasthereareparameterstobeestimated.Thus,eachoftheseareexactlyidentifiedcases.Therewillbeasinglesolutiontothemomentequations,andatthatsolution,theequationswillbeexactlysatisfied.4Buttherearecasesinwhichtherearemoremomentequationsthanparameters,sothesystemisoverdetermined.InExample18.5,wedefinedfoursamplemoments,n11g¯=y,y2,,lnyiiinyii=1withprobabilitylimitsP/λ,P(P+1)/λ2,λ/(P−1),andψ(P)—lnλ,respectively.Anypaircouldbeusedtoestimatethetwoparameters,butasshownintheearlierexample,thesixpairsproducesixsomewhatdifferentestimatesofθ=(P,λ).Insuchacase,tousealltheinformationinthesampleitisnecessarytodeviseawaytoreconciletheconflictingestimatesthatmayemergefromtheoverdeterminedsystem.Moregenerally,supposethatthemodelinvolvesKparameters,θ=(θ1,θ2,...,θK),andthatthetheoryprovidesasetofL>Kmomentconditions,E[ml(yi,xi,zi,θ)]=E[mil(θ)]=0whereyi,xi,andziarevariablesthatappearinthemodelandthesubscriptionmil(θ)4Thatis,ofcourseifthereisanysolution.Intheregressionmodelwithcollinearity,thereareKparametersbutfewerthanKindependentmomentequations.\nGreene-50240bookJune26,200215:6CHAPTER18✦TheGeneralizedMethodofMoments537indicatesthedependenceon(yi,xi,zi).Denotethecorrespondingsamplemeansas1n1nm¯l(y,X,Z,θ)=ml(yi,xi,zi,θ)=mil(θ).nni=1i=1Unlesstheequationsarefunctionallydependent,thesystemofLequationsinKun-knownparameters,1nm¯l(θ)=ml(yi,xi,zi,θ)=0,l=1,...,L,ni=1willnothaveauniquesolution.5ItwillbenecessarytoreconciletheLdifferentsetsKofestimatesthatcanbeproduced.Onepossibilityistominimizeacriterionfunction,suchasthesumofsquares,L26q=m¯l=m¯(θ)m¯(θ).(18-2)l=1Itcanbeshown[see,e.g.,Hansen(1982)]thatundertheassumptionswehavemadesofar,specificallythatplimm¯(θ)=E[m¯(θ)]=0,minimizingqin(18-2)producesaconsistent(albeit,asweshallsee,possiblyinefficient)estimatorofθ.Wecan,infact,useasthecriterionaweightedsumofsquares,q=m¯(θ)Wm¯(θ),nwhereWnisanypositivedefinitematrixthatmaydependonthedatabutisnotafunctionofθ,suchasIin(18-2),toproduceaconsistentestimatorofθ.7Forexample,wemightuseadiagonalmatrixofweightsifsomeinformationwereavailableabouttheimportance(bysomemeasure)ofthedifferentmoments.WedomaketheadditionalassumptionthatplimWn=apositivedefinitematrix,W.Bythesamelogicthatmakesgeneralizedleastsquarespreferabletoordinaryleastsquares,itshouldbebeneficialtouseaweightedcriterioninwhichtheweightsareinverselyproportionaltothevariancesofthemoments.LetWbeadiagonalmatrixwhosediagonalelementsarethereciprocalsofthevariancesoftheindividualmoments,11wll=√=.Asy.Var[nm¯l]φll(Wehavewrittenitinthisformtoemphasizethattheright-handsideinvolvesthevarianceofasamplemeanwhichisoforder(1/n).)Then,aweightedleastsquaresprocedurewouldminimizeq=m¯(θ)−1m¯(θ).(18-3)5ItmayifLisgreaterthanthesamplesize,n.WeassumethatLisstrictlylessthann.6ThisapproachisonethatQuandtandRamsey(1978)suggestedfortheprobleminExample18.3.7Inprinciple,theweightingmatrixcanbeafunctionoftheparametersaswell.SeeHansen,HeatonandYaron(1996)fordiscussion.Whetherthisprovidesanybenefitintermsoftheasymptoticpropertiesoftheestimatorseemsunlikely.TheonepayofftheauthorsdonoteisthatcertainestimatorsbecomeinvarianttothesortofnormalizationthatwediscussedinExample17.1.Inpracticalterms,thisislikelytobeaconsiderationonlyinafairlysmallclassofcases.\nGreene-50240bookJune26,200215:6538CHAPTER18✦TheGeneralizedMethodofMomentsIngeneral,theLelementsofm¯arefreelycorrelated.In(18-3),wehaveusedadiagonalWthatignoresthiscorrelation.Tousegeneralizedleastsquares,wewoulddefinethefullmatrix,√−1−1W=Asy.Var[nm¯]=.(18-4)Theestimatorsdefinedbychoosingθtominimizeq=m¯(θ)Wm¯(θ)nareminimumdistanceestimators.ThegeneralresultisthatifWnisapositivedefinitematrixandifplimm¯(θ)=0,thentheminimumdistance(generalizedmethodofmoments,orGMM)estimatorofθisconsistent.8SincetheOLScriterionin(18-2)usesI,thismethodproducesaconsistentestimator,asdoestheweightedleastsquaresestimatorandthefullGLSestimator.WhatremainstobedecidedisthebestWtouse.Intuitionmightsuggest(correctly)thattheonedefinedin(18-4)wouldbeoptimal,onceagainbasedonthelogicthatmotivatesgeneralizedleastsquares.ThisresultisthenowcelebratedoneofHansen(1982).Theasymptoticcovariancematrixofthisgeneralizedmethodofmomentsestimatoris11V=[W]−1=[−1]−1,(18-5)GMMnnwhereisthematrixofderivativeswithjthrowequaltoj∂m¯j(θ)=plim∂θ√and=Asy.Var[nm¯].Finally,byvirtueofthecentrallimittheoremappliedtothesamplemomentsandtheSlutskytheoremappliedtothismanipulation,wecanexpecttheestimatortobeasymptoticallynormallydistributed.WewillrevisittheasymptoticpropertiesoftheestimatorinSection18.3.3.Example18.7GMMEstimationoftheParametersofaGammaDistributionReferringonceagaintoourearlierresultsinExample18.5,weconsiderhowtouseallfourofoursamplemomentstoestimatetheparametersofthegammadistribution.9Thefourmomentequationsarey−P/λi0y2−P(P+1)/λ20Ei=lnyi−(P)+lnλ001/yi−λ/(P−1)8Inthemostgeneralcases,anumberofothersubtleconditionsmustbemetsoastoassertconsistencyandtheotherpropertieswediscuss.Forourpurposes,theconditionsgivenwillsuffice.MinimumdistanceestimatorsarediscussedinMalinvaud(1970),Hansen(1982),andAmemiya(1985).9WeemphasizethatthisexampleisconstructedonlytoillustratethecomputationofaGMMestimator.Thegammamodelisfullyspecifiedbythelikelihoodfunction,andtheMLEisfullyefficient.Wewillexamineothercasesthatinvolvelessdetailedspecificationslaterinthebook.\nGreene-50240bookJune26,200215:6CHAPTER18✦TheGeneralizedMethodofMoments539Thesamplemeansofthesewillprovidethemomentequationsforestimation.Lety1=y,y=y2,y=lny,andy=1/y.Then2341n1nm¯1(P,λ)=(yi1−P/λ)=[yi1−µ1(P,λ)]=y¯1−µ1(P,λ),nni=li=1andlikewiseform¯2(P,λ),m¯3(P,λ),andm¯4(P,λ).Forourinitialsetofestimates,wewilluseordinaryleastsquares.Theoptimizationproblemis4422MinimizeP,λm¯i(P,λ)=[y¯l−µl(P,λ)]=m¯(P,λ)m¯(P,λ).l=1l=1ThisestimatorwillbetheminimumdistanceestimatorwithW=I.Thisnonlinearopti-mizationproblemmustbesolvediteratively.Asstartingvaluesfortheiterations,weusedthemaximumlikelihoodestimatesfromExample18.5,PˆML=2.4106andλˆML=0.0770702.TheleastsquaresvaluesthatresultfromthisprocedurearePˆ=2.0582996andλˆ=0.06579888.WecannowusethesetoformourestimateofW.GMMestimationusuallyrequiresafirst-stepestimationsuchasthisonetoobtaintheweightingmatrixW.Withthesenewestimatesinhand,weobtainedyi1−Pˆ/λˆyi1−Pˆ/λˆ2022ˆ=1yi2−Pˆ(Pˆ+1)/λˆyi2−Pˆ(Pˆ+1)/λˆ.20yi3−(Pˆ)+lnλˆyi3−(Pˆ)+lnλˆi=1yi4−λ/ˆ(Pˆ−1)yi4−λ/ˆ(Pˆ−1)(Note,wecouldhavecomputedˆusingthemaximumlikelihoodestimates.)TheGMMestimatorisnowobtainedbyminimizingq=m¯(P,λ)ˆ−1m¯(P,λ).ThetwoestimatesarePˆGMM=3.35894andλˆGMM=0.124489.Atthesetwovalues,thevalueofthefunctionisq=1.97522.Toobtainanasymptoticcovariancematrixforthetwoestimates,wefirstrecomputeˆasshownabove;24.705112307.126229,609.5ˆ=.200.697458.81480.0230−0.0283−2.1423−0.00110.000065413Tocompletethecomputation,wewillrequirethederivativesmatrix,G¯∂m¯1/∂P∂m¯2/∂P∂m¯3/∂P∂m¯4/∂P(θ)=∂m¯1/∂λ∂m¯2/∂λ∂m¯3/∂λ∂m¯4/∂λ−1/λ−(2P+1)/λ2−(P)λ/(P−1)2=.P/λ22P(P+1)/λ31/λ−1/(P−1)−8.0328−498.01−0.346350.022372G¯(θˆ)=.216.7415178.28.0328−0.42392Finally,10.2022010.0117344[Gˆˆ−1Gˆ]−1=200.01173440.000867519\nGreene-50240bookJune26,200215:6540CHAPTER18✦TheGeneralizedMethodofMomentsTABLE18.1EstimatesoftheParametersofaGammaDistributionGeneralizedMethodParameterMaximumLikelihoodofMomentsP2.41063.3589StandardError(0.87683)(0.449667)λ0.07707010.12449StandardError(0.02707)(0.029099)givestheestimatedasymptoticcovariancematrixfortheestimators.RecallthatinExam-ple18.5,weobtainedmaximumlikelihoodestimatesofthesameparameters.Table18.1summarizes.Lookingahead,weshouldhaveexpectedtheGMMestimatortoimprovethestandarderrors.ThefactthatitdoesforPbutnotforλmightcastsomesuspiciononthespecificationofthemodel.Infact,thedatageneratingprocessunderlyingthesedataisnotagammapopulation—thevalueswerehandpickedbytheauthor.Thus,thefindingsinTable18.1mightnotbesurprising.WewillreturntothisissueinSection18.4.1.18.3.3PROPERTIESOFTHEGMMESTIMATORWewillnowexaminethepropertiesoftheGMMestimatorinsomedetail.SincetheGMMestimatorincludesotherfamiliarestimatorsthatwehavealreadyencountered,includingleastsquares(linearandnonlinear),instrumentalvariables,andmaximumlikelihood,theseresultswillextendtothosecases.Thediscussiongivenherewillonlysketchtheelementsoftheformalproofs.Theassumptionswemakeherearesomewhatnarrowerthanafullygeneraltreatmentmightallow;buttheyarebroadenoughtoin-cludethesituationslikelytoariseinpractice.Moredetailedandrigoroustreatmentsmaybefoundin,forexample,NeweyandMcFadden(1994),White(2001),Hayashi(2000),Mittelhammeretal.(2000),orDavidson(2000).ThisdevelopmentwillcontinuetheanalysisbeguninSection10.4andaddsomedetailtotheformalresultsofSection16.5.TheGMMestimatorisbasedonthesetofpopulationorthogonalityconditions,E[mi(θ0)]=0wherewedenotethetrueparametervectorbyθ0.Thesubscriptionthetermontherighthandsideindicatesdependenceontheobserveddata,yi,xi,zi.AveragingthisoverthesampleobservationsproducesthesamplemomentequationE[m¯n(θ0)]=0where1nm¯n(θ0)=mi(θ0).ni=1ThismomentisasetofLequationsinvolvingtheKparameters.Wewillassumethatthisexpectationexistsandthatthesamplecounterpartconvergestoit.Thedefinitionsarecastintermsofthepopulationparametersandareindexedbythesamplesize.Tofixtheideas,consider,onceagain,theempiricalmomentequationswhichdefinetheinstrumentalvariableestimatorforalinearornonlinearregressionmodel.\nGreene-50240bookJune26,200215:6CHAPTER18✦TheGeneralizedMethodofMoments541Example18.8EmpiricalMomentEquationforInstrumentalVariablesFortheIVestimatorinthelinearornonlinearregressionmodel,weassumen1E[m¯n(β)]=Ezi[yi−h(xi,β)]=0.ni=1ThereareLinstrumentalvariablesinziandKparametersinβ.ThisstatementdefinesLmomentequations,oneforeachinstrumentalvariable.Wemakethefollowingassumptionsaboutthemodelandtheseempiricalmoments:ASSUMPTION18.1.ConvergenceoftheEmpiricalMoments:Thedatageneratingprocessisassumedtomeettheconditionsforalawoflargenumberstoapply,sothatwemayassumethattheempiricalmomentsconvergeinprobabilitytotheirexpectation.AppendixDlistsseveraldifferentlawsoflargenumbersthatincreaseingenerality.Whatisrequiredforthisassumptionisthat1npm¯n(θ0)=mi(θ0)−→0.ni=1ThelawsoflargenumbersthatweexaminedinAppendixDaccommodatecasesofindependentobservations.Casesofdependentorcorrelatedobservationscanbegath-eredundertheErgodicTheorem(12.1).Forthismoregeneralcase,then,wewouldassumethatthesequenceofobservationsm(θ)constantajointly(L×1)stationaryandergodicprocess.Theempiricalmomentsareassumedtobecontinuousandcontinuouslydif-ferentiablefunctionsoftheparameters.Forourexampleabove,thiswouldmeanthattheconditionalmeanfunction,h(xi,β)isacontinuousfunctionofβ(thoughnotnecessarilyofxi).Withcontinuityanddifferentiability,wealsowillbeabletoassumethatthederiva-tivesofthemoments,∂m¯(θ)1n∂m(θ)G¯(θ)=n0=i,n0n0∂θ0n∂θ0i=1convergetoaprobabilitylimit,sayplimG¯n(θ0)=G¯(θ0).Forsetsofindependentobser-vations,thecontinuityofthefunctionsandthederivativeswillallowustoinvoketheSlutskyTheoremtoobtainthisresult.Forthemoregeneralcaseofsequencesofdepen-dentobservations,Theorem12.2,ErgodicityofFunctions,willprovideacounterparttotheSlutskyTheoremfortimeseriesdata.Insum,ifthemomentsthemselvesobeyalawoflargenumbers,thenitisreasonabletoassumethatthederivativesdoaswell.ASSUMPTION18.2.Identification:Foranyn≥K,ifθ1andθ2aretwodifferentpa-rametervectors,thenthereexistdatasetssuchthatm¯n(θ1)=m¯n(θ2).Formally,inSection16.5.3,identificationisdefinedtoimplythattheprobabilitylimitoftheGMMcriterionfunctionisuniquelyminimizedatthetrueparameters,θ0.\nGreene-50240bookJune26,200215:6542CHAPTER18✦TheGeneralizedMethodofMomentsAssumption18.2isapracticalprescriptionforidentification.Moreformalcondi-tionsarediscussedinSection16.5.3.Wehaveexaminedtwoviolationsofthiscrucialassumption.Inthelinearregressionmodel,oneoftheassumptionsisfullrankofthematrixofexogenousvariables—theabsenceofmulticollinearityinX.Inourdiscussionofthemaximumlikelihoodestimator,weencounteredacase(Example17.2)inwhichtheanormalizationwasneededtoidentifythevectorofparameters.[SeeHansenetal.(1996)fordiscussionofthiscase.]Bothofthesecasesareincludedinthisassumption.Theidentificationconditionhasthreeimportantimplications:OrderConditionThenumberofmomentconditionsisatleastaslargeasthenumberofparameter;L≥K.Thisisnecessarybutnotsufficientforidentification.RankConditionTheL×Kmatrixofderivatives,G¯n(θ0)willhaverowrankequaltoK.(Again,notethatthenumberofrowsmustequalorexceedthenumberofcolumns.)UniquenessWiththecontinuityassumption,theidentificationassumptionimpliesthattheparametervectorthatsatisfiesthepopulationmomentconditionisunique.Weknowthatatthetrueparametervector,plimm¯n(θ0)=0.Ifθ1isanyparametervectorthatsatisfiesthiscondition,thenθ1mustequalθ0.Assumptions18.1and18.2characterizetheparameterizationofthemodel.Togethertheyestablishthattheparametervectorwillbeestimable.WenowmakethestatisticalassumptionthatwillallowustoestablishthepropertiesoftheGMMestimator.ASSUMPTION18.3.AsymptoticDistributionofEmpiricalMoments:Weassumethattheempiricalmomentsobeyacentrallimittheorem.Thisassumesthatthemomentshaveafiniteasymptoticcovariancematrix,(1/n),sothat√dnm¯n(θ0)−→N[0,].Theunderlyingrequirementsonthedataforthisassumptiontoholdwillvaryandwillbecomplicatediftheobservationscomprisingtheempiricalmomentarenotindependent.Forsamplesofindependentobservations,weassumetheconditionsun-derlyingtheLindberg–Feller(D.19)orLiapounovCentralLimitTheorem(D.20)willsuffice.Forthemoregeneralcase,itisonceagainnecessarytomakesomeassumptionsaboutthedata.WehaveassumedthatE[mi(θ0)]=0.Ifwecangoastepfurtherandassumethatthefunctionsmi(θ0)areanergodic,stationarymartingaledifferenceseries,E[mi(θ0)|mi−1(θ0),mi−2(θ0)...]=0,thenwecaninvokeTheorem12.3,theCentralLimitTheoremforMartingaleDifferenceSeries.Itwillgenerallybefairlycomplicatedtoverifythisassumptionfornonlinearmodels,soitwillusuallybeassumedoutright.Ontheotherhand,theassumptionsarelikelytobefairlybenigninatypicalapplication.Forregressionmodels,theassumptiontakestheformE[ziεi|zi−1εi−1,...]=0whichwilloftenbepartofthecentralstructureofthemodel.\nGreene-50240bookJune26,200215:6CHAPTER18✦TheGeneralizedMethodofMoments543Withtheassumptionsinplace,wehaveTHEOREM18.1AsymptoticDistributionoftheGMMEstimatorUndertheprecedingassumptions,pθˆGMM−→θaθˆGMM∼N[θ,VGMM],(18-6)whereVGMMisdefinedin(18-5).WewillnowsketchaproofofTheorem18.1.TheGMMestimatorisobtainedbyminimizingthecriterionfunctionq(θ)=m¯(θ)Wm¯(θ)nnnnwhereWnistheweightingmatrixused.Consistencyoftheestimatorthatminimizesthiscriterioncanbeestablishedbythesamelogicweusedforthemaximumlikelihoodestimator.Itmustfirstbeestablishedthatqn(θ)convergestoavalueq0(θ).ByourassumptionsofstrictcontinuityandAssumption18.1,qn(θ0)convergesto0.(WecouldapplytheSlutskytheoremtoobtainthisresult.)Wewillassumethatqn(θ)convergestoq0(θ)forotherpointsintheparameterspaceaswell.SinceWnispositivedefinite,foranyfiniten,weknowthat0≤qn(θˆGMM)≤qn(θ0).(18-7)Thatis,inthefinitesample,θˆGMMactuallyminimizesthefunction,sothesamplevalueofthecriterionisnotlargeratθˆGMMthanatanyothervalue,includingthetrueparameters.pBut,atthetrueparametervalues,qn(θ0)−→0.So,if(18-7)istrue,thenitmustfollowpthatqn(θˆGMM)−→0aswellbecauseoftheidentificationassumption,18.2.Asn→∞,qn(θˆGMM)andqn(θ)convergetothesamelimit.Itmustbethecase,then,thatasn→∞,m¯n(θˆGMM)→m¯n(θ0),sincethefunctionisquadraticandWispositivedefinite.Theidentificationconditionthatweassumedearliernowassuresthatasn→∞,θˆGMMmustequalθ0.Thisestablishesconsistencyoftheestimator.Wewillnowsketchaproofoftheasymptoticnormalityoftheestimator:ThefirstorderconditionsfortheGMMestimatorare∂qn(θˆGMM)=2G¯n(θˆGMM)Wnm¯n(θˆGMM)=0.(18-8)∂θˆGMM(Theleading2isirrelevanttothesolution,soitwillbedroppedatthispoint.)Theorthogonalityequationsareassumedtobecontinuousandcontinuouslydifferentiable.ThisallowsustoemploythemeanvaluetheoremasweexpandtheempiricalmomentsinalinearTaylorseriesaroundthetruevalue.θ;m¯n(θˆGMM)=m¯n(θ0)+G¯n(θ¯)(θˆGMM−θ0),(18-9)whereθ¯isapointbetweenθˆGMMandthetrueparameters,θ0.Thus,foreachelementθ¯k=wkθˆk,GMM+(1−wk)θ0,kforsomewksuchthat0K)thepa-rametersofthemodel.Forconvenience,definee(X,βˆ)=yi−h(xi,βˆ),i=1,...,n,andZ=n×Lmatrixwhoseithrowisz.iByastraightforwardextensionofourearlierresults,wecanproduceaGMMestimatorofβ.Thesamplemomentswillbe1n1m¯(β)=ze(x,β)=Ze(X,β).niinni=1Theminimumdistanceestimatorwillbetheβˆthatminimizes11q=m¯(βˆ)Wm¯(βˆ)=[e(X,βˆ)Z]W[Ze(X,βˆ)](18-13)nnnnforsomechoiceofWthatwehaveyettodetermine.Thecriteriongivenaboveproducesthenonlinearinstrumentalvariableestimator.IfweuseW=(ZZ)−1,thenwehaveexactlytheestimationcriterionweusedinSection9.5.1wherewedefinedthenonlinearinstrumentalvariablesestimator.Apparently(18-13)ismoregeneral,sincewearenotlimitedtothischoiceofW.ThelinearIVestimatorisaspecialcase.ForanygivenchoiceofW,aslongasthereareenoughorthogonalityconditionstoidentifytheparameters,estimationbyminimizingqis,atleastinprinciple,astraightforwardprobleminnonlinearoptimization.Hansen(1982)showedthattheoptimalchoiceofWforthisestimatoris√−1WGMM=Asy.Var[nm¯n(β)]n−1$%−111(18-14)=Asy.Var√zε=Asy.Var√Ze(X,β).iinni=1\nGreene-50240bookJune26,200215:6546CHAPTER18✦TheGeneralizedMethodofMomentsForourmodel,thisis1nn1nnZZW=Cov[zε,zε]=σzz=.iijjijijnnni=1j=1i=1j=1Ifweinsertthisresultin(18-13),weobtainthecriterionfortheGMMestimator:−11ZZ1q=e(X,βˆ)ZZe(X,βˆ).nnnThereisapossiblydifficultdetailtobeconsidered.TheGMMestimatorinvolves11nn1nnZZ=zzCov[εε]=zzCov[(y−h(x,β))(y−h(x,β))].ijijijiijjnnni=1j=1i=1j=1TheconditionsunderwhichsuchadoublesummightconvergetoapositivedefinitematrixaresketchedinSections5.3.2and12.4.1.Assumingthattheydohold,estimationappearstorequirethatanestimateofβbeinhandalready,eventhoughitistheobjectofestimation.Itmaybethataconsistentbutinefficientestimatorofβisavailable.Supposeforthepresentthatoneis.Ifobservationsareuncorrelated,thenthecrossobservationstermsmaybeomitted,andwhatisrequiredis11nZZ=zzVar[(y−h(x,β))].iiiinni=1WecanusetheWhite(1980)estimatordiscussedinSection11.2.2and11.3forthiscase:1nS=zz(y−h(x,βˆ))2.(18-15)0iiiini=1Ifthedisturbancesareautocorrelatedbuttheprocessisstationary,thenNeweyandWest’s(1987a)estimatorisavailable(assumingthattheautocorrelationsaresufficientlysmallatareasonablelag,p):pnp1S=S+w()ee(zz+zz)=w()S,(18-16)0ii−ii−i−in=1i=+1=0wherew()=1−.p+1Themaximumlaglengthpmustbedeterminedinadvance.Wewillrequirethatobservationsthatarefarapartintime—thatis,forwhich|i−|islarge—musthaveincreasinglysmallercovariancesforustoestablishtheconvergenceresultsthatjustifyOLS,GLS,andnowGMMestimation.Thechoiceofpisareflectionofhowfarbackintimeonemustgotoconsidertheautocorrelationnegligibleforpurposesofestimating(1/n)ZZ.CurrentpracticesuggestsusingthesmallestintegergreaterthanorequaltoT1/4.Stillleftopenisthequestionofwheretheinitialconsistentestimatorshouldbeobtained.OnepossibilityistoobtainaninefficientbutconsistentGMMestimatorby\nGreene-50240bookJune26,200215:6CHAPTER18✦TheGeneralizedMethodofMoments547usingW=Iin(18-13).Thatis,useanonlinear(orlinear,iftheequationislinear)instrumentalvariablesestimator.Thisfirst-stepestimatorcanthenbeusedtoconstructW,which,inturn,canthenbeusedintheGMMestimator.AnotherpossibilityisthatβmaybeconsistentlyestimablebysomestraightforwardprocedureotherthanGMM.OncetheGMMestimatorhasbeencomputed,itsasymptoticcovariancematrixandasymptoticdistributioncanbeestimatedbasedon(18-11)and(18-12).Recallthat1nm¯n(β)=ziεi,ni=1whichisasumofL×1vectors.Thederivative,∂m¯n(β)/∂β,isasumofL×Kmatrices,sonnG¯(β)=∂m¯(β)/∂β=1G(β)=1z∂εi.(18-17)iinn∂βi=1i=1Inthemodelweareconsideringhere,∂εi−∂h(xi,β)=.∂β∂βThederivativesarethepseudoregressorsinthelinearizedregressionmodelthatweexaminedinSection9.2.3.Usingthenotationdefinedthere,∂εi=−xi0,∂βso1n1n1G¯(β)=G(β)=−zx=−ZX.(18-18)iii00nnni=1i=1Withthismatrixinhand,theestimatedasymptoticcovariancematrixfortheGMMestimatoris−1−11Est.Asy.Var[βˆ]=G(βˆ)ZˆZG(βˆ)=[(XZ)(ZˆZ)−1(ZX)]−1.00n(18-19)(Thetwominussigns,a1/n2andann2,allfalloutoftheresult.)Ifthethatappearsin(18-19)wereσ2I,then(18-19)wouldbepreciselytheasymp-toticcovariancematrixthatappearsinTheorem5.4forlinearmodelsandTheorem9.3fornonlinearmodels.ButthereisaninterestingdistinctionbetweenthisestimatorandtheIVestimatorsdiscussedearlier.Intheearliercases,whenthereweremoreinstrumentalvariablesthanparameters,weresolvedtheoveridentificationbyspecifi-callychoosingasetofKinstruments,theKprojectionsofthecolumnsofXorX0intothecolumnspaceofZ.Here,incontrast,wedonotattempttoresolvetheoveridenti-fication;wesimplyusealltheinstrumentsandminimizetheGMMcriterion.Nowyoushouldbeabletoshowthatwhen=σ2Iandweusethisinformation,whenallissaidanddone,thesameparameterestimateswillbeobtained.But,ifweuseaweightingmatrixthatdiffersfromW=(ZZ/n)−1,thentheyarenot.\nGreene-50240bookJune26,200215:6548CHAPTER18✦TheGeneralizedMethodofMoments18.4TESTINGHYPOTHESESINTHEGMMFRAMEWORKTheestimationframeworkdevelopedintheprevioussectionprovidesthebasisforaconvenientsetofstatisticsfortestinghypotheses.Wewillconsiderthreegroupsoftests.Thefirstisapairofstatisticsthatisusedfortestingthevalidityoftherestrictionsthatproducethemomentequations.ThesecondisatrioofteststhatcorrespondtothefamiliarWald,LM,andLRteststhatwehaveexaminedatseveralpointsintheprecedingchapters.ThethirdisaclassoftestsbasedonthetheoreticalunderpinningsoftheconditionalmomentsthatweusedearliertodevisetheGMMestimator.18.4.1TESTINGTHEVALIDITYOFTHEMOMENTRESTRICTIONSIntheexactlyidentifiedcasesweexaminedearlier(leastsquares,instrumentalvariables,maximumlikelihood),thecriterionforGMMestimationq=m¯(θ)Wm¯(θ)wouldbeexactlyzerobecausewecanfindasetofestimatesforwhichm¯(θ)isexactlyzero.Thusintheexactlyidentifiedcasewhentherearethesamenumberofmomentequationsasthereareparameterstoestimate,theweightingmatrixWisirrelevanttothesolution.Butiftheparametersareoveridentifiedbythemomentequations,thentheseequationsimplysubstantiverestrictions.Assuch,ifthehypothesisofthemodelthatledtothemomentequationsinthefirstplaceisincorrect,atleastsomeofthesamplemomentrestrictionswillbesystematicallyviolated.Thisconclusionprovidesthebasisforatestoftheoveridentifyingrestrictions.Byconstruction,whentheoptimalweightingmatrixisused,√√−1√nq=nm¯(θˆ)Est.Asy.Var[nm¯(θˆ)]nm¯(θˆ),sonqisaWaldstatistic.Therefore,underthehypothesisofthemodel,d2nq−→χ[L−K].(Fortheexactlyidentifiedcase,therearezerodegreesoffreedomandq=0.)Example18.9OveridentifyingRestrictionsInHall’sconsumptionmodelwiththecorollarythetwoorthogonalityconditionsnotedinExample18.6exactlyidentifythetwoparameters.But,hisanalysisofthemodelsuggestsawaytotestthespecification.Theconclusion,“Noinformationavailableintimetapartfromthelevelofconsumption,cthelpspredictfutureconsumption,ct+1,inthesenseofaffectingtheexpectedvalueofmarginalutility.Inparticular,incomeorwealthinperiodstorearlierareirrelevantoncectisknown”suggestshowonemighttestthemodel.Iflaggedvaluesofincome(Ytmightequaltheratioofcurrentincometothepreviousperiod’sincome)areaddedtothesetofinstruments,thenthemodelisnowoveridentifiedbytheorthogonalityconditions;1Rt0Eβ(1+r)Rλ−1×=.tt+1t+1Y0t−1Yt−2\nGreene-50240bookJune26,200215:6CHAPTER18✦TheGeneralizedMethodofMoments549Asimpletestoftheoveridentifyingrestrictionswouldbesuggestiveofthevalidityofthemodel.Rejectingtherestrictionscastsdoubtontheoriginalmodel.Hall’sproposedteststodistinguishthelifecycle—permanentincomemodelfromothertheoriesofconsump-tioninvolvedaddingtwolagsofincometotheinformationset.HistestismoreinvolvedthantheonesuggestedaboveHansenandSingleton(1982)operateddirectlyonthisformofthemodel.Otherstudies,forexample,CampbellandMankiw(1989)aswellasHall’s,usedthemodel’simplicationstoformulatemoreconventionalinstrumentalvariableregres-sionmodels.Theprecedingisaspecificationtest,notatestofparametricrestrictions.However,thereisasymmetrybetweenthemomentrestrictionsandrestrictionsontheparametervector.SupposeθissubjectedtoJrestrictions(linearornonlinear)whichrestrictthenumberoffreeparametersfromKtoK−J.(Thatis,reducethedimensionalityoftheparameterspacefromKtoK−J.)ThenatureoftheGMMestimationproblemwehaveposedisnotchangedatallbytherestrictions.Theconstrainedproblemmaybestatedintermsofq=m¯(θ)Wm¯(θ).RRRNotethattheweightingmatrix,W,isunchanged.Theprecisenatureofthesolutionmethodmaybechanged—therestrictionsmandateaconstrainedoptimization.How-ever,thecriterionisessentiallyunchanged.Itfollowsthenthatd2nqR−→χ[L−(K−J)].Thisresultsuggestsamethodoftestingtherestrictions,thoughthedistributiontheoryisnotobvious.Theweightedsumofsquareswiththerestrictionsimposed,nqRmustbelargerthantheweightedsumofsquaresobtainedwithouttherestrictions,nq.Thedifferenceisd2(nqR−nq)−→χ[J].(18-20)ThetestisattributedtoNeweyandWest(1987b).Thisprovidesonemethodoftestingasetofrestrictions.(Thesmall-samplepropertiesofthistestwillbethecentralfocusoftheapplicationdiscussedinSection18.5.)Wenowconsiderseveralalternatives.18.4.2GMMCOUNTERPARTSTOTHEWALD,LM,ANDLRTESTSSection17.5describedatriooftestingproceduresthatcanbeappliedtoahypothesisinthecontextofmaximumlikelihoodestimation.Toreiterate,letthehypothesistobetestedbeasetofJpossiblynonlinearrestrictionsonKparametersθintheformH0:r(θ)=0.Letc1bethemaximumlikelihoodestimatesofθestimatedwithouttherestrictions,andletc0denotetherestrictedmaximumlikelihoodestimates,thatis,theestimatesobtainedwhileimposingthenullhypothesis.Thethreestatistics,whichareasymptoticallyequivalent,areobtainedasfollows:LR=likelihoodratio=−2(lnL0−lnL1),wherelnLj=loglikelihoodfunctionevaluatedatcj,j=0,1.\nGreene-50240bookJune26,200215:6550CHAPTER18✦TheGeneralizedMethodofMomentsThelikelihoodratiostatisticrequiresthatbothestimatesbecomputed.TheWaldstatis-ticis−1W=Wald=[r(c1)]Est.Asy.Var[r(c1)][r(c1)].(18-21)TheWaldstatisticisthedistancemeasureforthedegreetowhichtheunrestrictedesti-matorfailstosatisfytherestrictions.TheusualestimatorfortheasymptoticcovariancematrixwouldbeEst.Asy.Var[r(c)]=AEst.Asy.Var[c]A,(18-22)1111whereA=∂r(c)/∂c(AisaJ×Kmatrix).1111TheWaldstatisticcanbecomputedusingonlytheunrestrictedestimate.TheLMstatisticis−1LM=Lagrangemultiplier=g1(c0)Est.Asy.Var[g1(c0)]g1(c0),(18-23)whereg1(c0)=∂lnL1(c0)/∂c0,thatis,thefirstderivativesoftheunconstrainedlog-likelihoodcomputedattherestrictedestimates.ThetermEst.Asy.Var[g1(c0)]isinverseofanyoftheusualestimatorsoftheasymptoticcovariancematrixofthemaximumlikelihoodestimatorsoftheparameters,computedusingtherestrictedestimates.ThemostconvenientchoiceisusuallytheBHHHestimator.TheLMstatisticisbasedontherestrictedestimates.NeweyandWest(1987b)havedevisedcounterpartstotheseteststatisticsfortheGMMestimator.TheWaldstatisticiscomputedidentically,usingtheresultsofGMMestimationratherthanmaximumlikelihood.10Thatis,in(18-21),wewouldusetheunrestrictedGMMestimatorofθ.Theappropriateasymptoticcovariancematrixis(18-12).Thecomputationisexactlythesame.ThecounterparttotheLRstatisticisthedifferenceinthevaluesofnqin(18-20).Itisnecessarytousethesameweightingmatrix,W,inbothrestrictedandunrestrictedestimators.Sincetheunrestrictedesti-matorisconsistentunderbothH0andH1,aconsistent,unrestrictedestimatorof√θis−1−1usedtocomputeW.Labelthis1=Asy.Var[nm¯1(c1)].Ineachoccurrence,thesubscript1indicatesreferencetotheunrestrictedestimator.Thenqisminimizedwithoutrestrictionstoobtainq1andthensubjecttotherestrictionstoobtainq0.Thestatisticisthen(nq−nq).11SinceweareusingthesameWinbothcases,thisstatistic01isnecessarilynonnegative.(ThisisthestatisticdiscussedinSection18.4.1.)Finally,thecounterparttotheLMstatisticwouldbeˆ−1G¯G¯ˆ−1G¯−1G¯ˆ−1LMGMM=nm¯1(c0)11(c0)1(c0)11(c0)1(c0)1m¯1(c0).10SeeBurnsideandEichenbaum(1996)forsomesmall-sampleresultsonthisprocedure.NeweyandMcFadden(1994)haveshowntheasymptoticequivalenceofthethreeprocedures.11NeweyandWestlabelthistesttheDtest.\nGreene-50240bookJune26,200215:6CHAPTER18✦TheGeneralizedMethodofMoments551ThelogicforthisLMstatisticisthesameasthatfortheMLE.Thederivativesoftheminimizedcriterionqin(18-3)are∂qˆ−1g1(c0)==2G¯1(c0)1m¯(c0).∂c0TheLMstatistic,LMGMM,isaWaldstatisticfortestingthehypothesisthatthisvectorequalszeroundertherestrictionsofthenullhypothesis.Fromourearlierresults,wewouldhave4√Est.Asy.Var[g(c)]=G¯(c)ˆ−1Est.Asy.Var[nm¯(c)]ˆ−1G¯(c).101010110n√Theestimatedasymptoticvarianceofnm¯(c0)isˆ1,soEst.Asy.Var[g(c)]=4G¯(c)ˆ−1G¯(c).1010110nTheWaldstatisticwouldbe−1Wald=g1(c0)Est.Asy.Var[g1(c0)]g1(c0)−1−1−1−1(18-24)=nm¯1(c0)ˆ1G¯(c0)G¯(c0)ˆ1G¯(c0)G¯(c0)ˆ1m¯1(c0).18.5APPLICATION:GMMESTIMATIONOFADYNAMICPANELDATAMODELOFLOCALGOVERNMENTEXPENDITURES(ThisexamplecontinuestheanalysisbeguninExample13.7.)DahlbergandJohansson(2000)estimatedamodelforthelocalgovernmentexpenditureofseveralhundredmunicipalitiesinSwedenobservedoverthe9-yearperiodt=1979to1987.TheequationofinterestismmmSi,t=αt+βjSi,t−j+γjRi,t−j+δjGi,t−j+fi+εitj=1j=1j=1fori=1,...,N=265andt=m+1,...,9.(Wehavechangedtheirnotationslightlytomakeitmoreconvenient.)Si,t,Ri,tandGi,taremunicipalspending,receipts(taxesandfees)andcentralgovernmentgrants,respectively.AnalogousequationsarespecifiedforthecurrentvaluesofRi,tandGi,t.Theappropriatelaglength,m,isoneofthefeaturesofinteresttobedeterminedbytheempiricalstudy.Themodelcontainsamunicipalityspecificeffect,fi,whichisnotspecifiedasbeingeither“fixed”or“random.”Inordertoeliminatetheindividualeffect,themodelisconvertedtofirstdifferences.TheresultingequationismmmSi,t=λt+βjSi,t−j+γjRi,t−j+δjGi,t−j+uitj=1j=1j=1ory=xθ+u,i,ti,ti,twhereSi,t=Si,t−Si,t−1andsoonandui,t=εi,t−εi,t−1.Thisremovesthegroupef-fectandleavesthetimeeffect.Sincethetimeeffectwasunrestrictedtobeginwith,\nGreene-50240bookJune26,200215:6552CHAPTER18✦TheGeneralizedMethodofMomentsαt=λtremainsanunrestrictedtimeeffect,whichistreatedas“fixed”andmodeledwithatime-specificdummyvariable.Themaximumlaglengthissetatm=3.With9yearsofdata,thisleavesuseableobservationsfrom1983to1987forestimation,thatis,t=m+2,...,9.SimilarequationswerefitforRi,tandGi,t.TheorthogonalityconditionsclaimedbytheauthorsareE[Si,sui,t]=E[Ri,sui,t]=E[Gi,sui,t]=0,s=1,...,t−2.Theorthogonalityconditionsarestatedintermsofthelevelsofthefinancialvariablesandthedifferencesofthedisturbances.Theissueofthisformulationasopposedto,forexample,E[Si,sεi,t]=0(whichisimplied)isdiscussedbyAhnandSchmidt(1995).Asweshallsee,thissetoforthogonalityconditionsimpliesatotalof80instrumentalvariables.Theauthorsuseonlythefirstofthethreesetslistedabove,whichproducesatotalof30.Forthefiveobservations,usingtheformulationdevelopedinSection13.6,wehavethefollowingmatrixofinstrumentalvariablesfortheorthogonalityconditionsSd00000000198381−798300S82−79d840000001984Z=0000Sd00001985i83−7985000000S84−79d8600198600000000Sd198785−7987wherethenotationEt1−t0indicatestherangeofyearsforthatvariable.Forexample,S83−79denotes[Si,1983,Si,1982,Si,1981,Si,1980,Si,1979]anddyeardenotestheyearspecificdummyvariable.CountingcolumnsinZiweseethatusingonlythelaggedvaluesofthedependentvariableandthetimedummyvariables,wehave(3+1)+(4+1)+(5+1)+(6+1)+(7+1)=30instrumentalvariables.Usingthelaggedvaluesoftheothertwovariablesineachequationwouldadd50more,foratotalof80ifalltheorthogonalityconditionssuggestedabovewereemployed.Giventheconstructionabove,theorthog-onalityconditionsarenowE[Zu]=0,iiwhereu=[u,u,u,u,u].Theempiricalmomentequationisii,1987i,1986i,1985i,1984i,19831NplimZu=plimm¯(θ)=0.iini=1Theparametersarevastlyoveridentified.Usingonlythelaggedvaluesofthedepen-dentvariableineachofthethreeequationsestimated,thereare30momentconditionsand14parametersbeingestimatedwhenm=3,11whenm=2,8whenm=1and5whenm=0.(Aswedoourestimationofeachofthese,wewillretainthesamematrixofinstrumentalvariablesineachcase.)GMMestimationproceedsintwosteps.Inthefirststep,basic,unweightedinstrumentalvariablesiscomputedusing−1−1−1NNNNNNθˆIV=XiZiZiZiZiXiXiZiZiZiZiyii=1i=1i=1i=1i=1i=1\nGreene-50240bookJune26,200215:6CHAPTER18✦TheGeneralizedMethodofMoments553wherey=(SSSSS)i8384858687andS82S81S80R82R81R80G82G81G8010000S83S82S81R83R82R81G83G82G8101000Xi=S84S83S82R84R83R82G84G83G8200100.S85S84S83R85R84R83G85G84G8300010S86S85S84R86R85R84G86G85G8400001Thesecondstepbeginswiththecomputationofthenewweightingmatrix,√1Nˆ=Est.Asy.Var[Nm¯]=ZuˆuˆZ.iiiiNi=1Aftermultiplyinganddividingbytheimplicit(1/N)intheoutsidematrices,weobtaintheestimator,NN−1N−1θ=XZZuˆuˆZZXGMMiiiiiiiii=1i=1i=1−1NNN×XZZuˆuˆZZyiiiiiiiii=1i=1i=1−1NNNN=XZWZXXZWZy.iiiiiiiii=1i=1i=1i=1Theestimatoroftheasymptoticcovariancematrixfortheestimatoristhematrixinsquarebracketsinthefirstlineoftheresult.Theprimaryfocusofinterestinthestudywasnottheestimatoritself,butthelaglengthandwhethercertainlaggedvaluesoftheindependentvariablesappearedineachequation.TheserestrictionswouldbetestedbyusingtheGMMcriterionfunction,whichinthisformulationwouldbe(basedonrecomputingtheresidualsafterGMMestimation)nnq=uˆZWZuˆ.iiiii=1i=1Notethattheweightingmatrixisnot(necessarily)recomputed.Forpurposesoftestinghypotheses,thesameweightingmatrixshouldbeused.Atthispoint,wewillconsidertheappropriatelaglength,m.ThespecificationcanbereducedsimplybyredefiningXtochangethelaglength.Inordertotestthespecification,theweightingmatrixmustbekeptconstantforallrestrictedversions(m=2andm=1)ofthemodel.TheDahlbergandJohanssondatamaybedownloadedfromtheJournalofAppliedEconometricswebsite—SeeAppendixTableF18.1.TheauthorsprovidethesummarystatisticsfortherawdatathataregiveninTable18.2.Thedatausedinthestudy\nGreene-50240bookJune26,200215:6554CHAPTER18✦TheGeneralizedMethodofMomentsTABLE18.2DescriptiveStatisticsforLocalExpenditureDataVariableMeanStd.DeviationMinimumMaximumSpending18478.513174.3612225.6833883.25Revenues13422.563004.166228.5429141.62Grants5236.031260.971570.6412589.14TABLE18.3EstimatedSpendingEquationVariableEstimateStandardErrortRatioYear1983−0.00365780.0002969−12.32Year1984−0.000496700.0004128−1.20Year19850.000380850.00030941.23Year19860.000314690.00032820.96Year19870.000868780.00014805.87Spending(t−1)1.154930.344093.36Revenues(t−1)−1.238010.36171−3.42Grants(t−1)0.0163100.824190.02Spending(t−2)−0.03766250.22676−0.17Revenues(t−2)0.07700750.271790.28Grants(t−2)1.553790.758412.05Spending(t−3)−0.564410.21796−2.59Revenues(t−3)0.649780.269302.41Grants(t−3)1.789180.692972.58andprovidedintheinternetsourcearenominalvaluesinSwedishKroner,deflatedbyamunicipalityspecificpriceindexthenconvertedtopercapitavalues.Descrip-tivestatisticsfortherawandtransformeddataappearinTable18.2.12Equationswereestimatedforallthreevariables,withmaximumlaglengthsofm=1,2,and3.(Theauthorsdidnotprovidetheactualestimates.)Estimationisdoneusingthemeth-odsdevelopedbyAhnandSchmidt(1995),ArellanoandBover(1995)andHoltz-Eakin,Newey,andRosen(1988),asdescribedabove.TheestimatesofthefirstspecificationgivenabovearegiveninTable18.3.Table18.4containsestimatesofthemodelparametersforeachofthethreeequa-tions,andforthethreelaglengths,aswellasthevalueoftheGMMcriterionfunctionforeachmodelestimated.Thebasecaseforeachmodelhasm=3.Therearethreerestrictionsimpliedbyeachreductioninthelaglength.Thecriticalchi-squaredvalueforthreedegreesoffreedomis7.81for95percentsignificance,soatthislevel,wefindthatthetwo-levelmodelisjustbarelyacceptedforthespendingequation,butclearlyappropriatefortheothertwo—thedifferencebetweenthetwocriteriais7.62.Condi-tionedonm=2,onlytherevenuemodelrejectstherestrictionofm=1.Asafinaltest,wemightaskwhetherthedatasuggestthatperhapsnolagstructureatallisnecessary.TheGMMcriterionvalueforthethreeequationswithonlythetimedummyvariablesare45.840,57.908,and62.042,respectively.Therefore,allthreezerolagmodelsarerejected.12Thedataprovidedonthewebsiteandusedinourcomputationswerefurthertransformedbydividingby100,000.\nGreene-50240bookJune26,200215:6CHAPTER18✦TheGeneralizedMethodofMoments555TABLE18.4EstimatedLagEquationsforSpending,Revenue,andGrantsExpenditureModelRevenueModelGrantModelm=3m=2m=1m=3m=2m=1m=3m=2m=1St−11.1550.87420.5562−0.1715−0.3117−0.1242−0.1675−0.1461−0.1958St−2−0.03770.2493—0.1621−0.0773—−0.0303−0.0304—St−3−0.5644——−0.1772——−0.0955——Rt−1−1.2380−0.8745−0.5328−0.01760.1863−0.02450.15780.14530.2343Rt−20.0770−0.2776—−0.03090.1368—0.04850.0175—Rt−30.6497——0.0034——0.0319——Gt−10.0163−0.42030.1275−0.36830.5425−0.0808−0.2381−0.2066−0.0559Gt−21.55380.1866—−2.71522.4621—−0.0492−0.0804—Gt−31.7892——0.0948——0.0598——q22.828730.452634.498630.539834.259053.250617.581020.541627.5927Amongtheinterestsinthisstudyweretheappropriatecriticalvaluestouseforthespecificationtestofthemomentrestriction.With16degreesoffreedom,thecriticalchi-squaredvaluefor95percentsignificanceis26.3,whichwouldsuggestthattherevenuesequationismisspecified.Usingabootstraptechnique,theauthorsfindthatamoreappropriatecriticalvalueleavesthespecificationintact.Finally,notethatthethree-equationmodelinthem=3columnsofTable18.4implyavectorautoregressionoftheformyt=1yt−1+2yt−2+3yt−3+vtwherey=(S,R,G).Wewillexplorethepropertiesandcharacteristicsofequa-tttttionsystemssuchasthisinourdiscussionoftimeseriesmodelsinChapter20.18.6SUMMARYANDCONCLUSIONSThegeneralizedmethodofmomentsprovidesanestimationframeworkthatincludesleastsquares,nonlinearleastsquares,instrumentalvariables,andmaximumlikelihood,andageneralclassofestimatorsthatextendsbeyondthese.Butitismorethanjustatheoreticalumbrella.TheGMMprovidesamethodofformulatingmodelsandimpliedestimatorswithoutmakingstrongdistributionalassumptions.Hall’smodelofhouseholdconsumptionisausefulexamplethatshowshowtheoptimizationconditionsofanunderlyingeconomictheoryproduceasetofdistributionfreeestimatingequations.Inthischapter,wefirstexaminedtheclassicalmethodofmoments.GMMasanestimatorisanextensionofthisstrategythatallowstheanalysttouseadditionalinformationbeyondthatnecessarytoidentifythemodel,inanoptimalfashion.Afterdefiningandestablishingthepropertiesoftheestimator,wethenturnedtoinferenceprocedures.ItisconvenientthattheGMMprocedureprovidescounterpartstothefamiliartrioofteststatistics,Wald,LM,andLR.Inthefinalsection,wedevelopedanexamplethatappearsatmanypointsintherecentappliedliterature,thedynamicpaneldatamodelwithindividualspecificeffects,andlaggedvaluesofthedependentvariable.Thischapterconcludesoursurveyofestimationtechniquesandmethodsinecono-metrics.Intheremainingchaptersofthebook,wewillexamineavarietyofapplications\nGreene-50240bookJune26,200215:6556CHAPTER18✦TheGeneralizedMethodofMomentsandmodelingtools,firstintimeseriesandmacroeconometricsinChapters19and20,thenindiscretechoicemodelsandlimiteddependentvariables,thestaplesofmicroe-conometrics,inChapters21and22.KeyTermsandConcepts•Analogestimation•LRstatistic•Ordercondition•Asymptoticproperties•Martingaledifference•Orthogonalityconditions•Centrallimittheoremsequence•Overidentifyingrestrictions•Centralmoments•Maximumlikelihood•Probabilitylimit•Consistentestimatorestimator•Randomsample•Dynamicpaneldatamodel•Meanvaluetheorem•Rankcondition•Empiricalmomentequation•Methodofmoment•Robustestimation•Ergodictheoremgeneratingfunctions•SlutskyTheorem•Eulerequation•Methodofmoments•Specificationteststatistic•Exactlyidentified•Methodofmoments•Sufficientstatistic•Exponentialfamilyestimators•Taylorseries•Generalizedmethodof•Minimumdistanceestimator•Uncenteredmomentmoments•Momentequation•Waldstatistic•Identification•Newey–Westestimator•Weightedleastsquares•Instrumentalvariables•Nonlinearinstrumental•LMstatisticvariableestimatorExercises1.Forthenormaldistributionµ=σ2k(2k)!/(k!2k)andµ=0,k=0,1,....Use2k2k+1thisresulttoanalyzethetwoestimators,m3m4b1=3/2andb2=2.m2m21nkwheremk=ni=1(xi−x¯).Thefollowingresultwillbeuseful:√√Asy.Cov[nmj,nmk]=µj+k−µjµk+jkµ2µj−1µk−1−jµj−1µk+1−kµk−1µj+1.Usethedeltamethodtoobtaintheasymptoticvariancesandcovarianceofthesetwofunctionsassumingthedataaredrawnfromanormaldistributionwithmeanµandvarianceσ2.(Hint:Undertheassumptions,thesamplemeanisaconsistentestimatorofµ,soforpurposesofderivingasymptoticresults,thedifferencebetweenx¯andµmaybeignored.Assuch,nogeneralityislostbyassumingthemeaniszero,andproceedingfromthere.ObtainV,the3×3covariancematrixforthethreemoments,thenusethedeltamethodtoshowthatthecovariancematrixforthetwoestimatorsis60JVJ=024whereJisthe2×3matrixofderivatives.2.UsingtheresultsinExample18.7,estimatetheasymptoticcovariancematrixofthemethodofmomentsestimatorsofPandλbasedonmandm[Note:Youwill12needtousethedatainExampleC.1toestimateV.]\nGreene-50240bookJune26,200215:6CHAPTER18✦TheGeneralizedMethodofMoments5573.ExponentialFamiliesofDistributions.Foreachofthefollowingdistributions,determinewhetheritisanexponentialfamilybyexaminingthelog-likelihoodfunc-tion.Then,identifythesufficientstatistics.a.Normaldistributionwithmeanµandvarianceσ2.b.TheWeibulldistributioninExercise4inChapter17.c.ThemixturedistributioninExercise3inChapter17.4.Intheclassicalregressionmodelwithheteroscedasticity,whichismoreefficient,ordinaryleastsquaresorGMM?Obtainthetwoestimatorsandtheirrespectiveasymptoticcovariancematrices,thenproveyourassertion.5.ConsidertheprobitmodelanalyzedinSection17.8.Themodelstatesthatforgivenvectorofindependentvariables,Prob[y=1|x]=[xβ],Prob[y=0|x]=1−Prob[y=1|x].iiiiiiiWehaveconsideredmaximumlikelihoodestimationoftheparametersofthismodelatseveralpoints.Consider,instead,aGMMestimatorbasedontheresultthatE[y|x]=(xβ)iiiThissuggeststhatwemightbaseestimationontheorthogonalityconditionsE[(y−(xβ))x]=0iiiConstructaGMMestimatorbasedontheseresults.Notethatthisisnotthenon-linearleastsquaresestimator.Explain—whatwouldtheorthogonalityconditionsbefornonlinearleastsquaresestimationofthismodel?6.ConsiderGMMestimationofaregressionmodelasshownatthebeginningofExample18.8.LetW1betheoptimalweightingmatrixbasedonthemomentequations.LetW2besomeotherpositivedefinitematrix.Comparetheasymp-toticcovariancematricesofthetwoproposedestimators.ShowconclusivelythattheasymptoticcovariancematrixoftheestimatorbasedonW1isnotlargerthanthatbasedonW2.\nGreene-50240bookJune26,200221:5519MODELSWITHLAGGEDVARIABLESQ19.1INTRODUCTIONThischapterbeginsourintroductiontotheanalysisofeconomictimeseries.Bymostviews,thisfieldhasbecomesynonymouswithempiricalmacroeconomicsandtheanal-ysisoffinancialmarkets.1Inthisandthenextchapter,wewillconsideranumberofmodelsandtopicsinwhichtimeandrelationshipsthroughtimeplayanexplicitpartintheformulation.Considerthedynamicregressionmodelyt=β1+β2xt+β3xt−1+γyt−1+εt.(19-1)Modelsofthisformspecificallyincludeasright-handsidevariablesearlieraswellascontemporaneousvaluesoftheregressors.Itisalsointhiscontextthatlaggedvaluesofthedependentvariableappearasaconsequenceofthetheoreticalbasisofthemodelratherthanasacomputationalmeansofremovingautocorrelation.Thereareseveralreasonswhylaggedeffectsmightappearinanempiricalmodel.•Inmodelingtheresponseofeconomicvariablestopolicystimuli,itisexpectedthattherewillbepossiblylonglagsbetweenpolicychangesandtheirimpacts.Thelengthoflagbetweenchangesinmonetarypolicyanditsimpactonimportanteconomicvariablessuchasoutputandinvestmenthasbeenasubjectofanalysisforseveraldecades.•Eitherthedependentvariableoroneoftheindependentvariablesisbasedonexpectations.Expectationsabouteconomiceventsareusuallyformedbyaggregat-ingnewinformationandpastexperience.Thus,wemightwritetheexpectationofafuturevalueofvariablex,formedthisperiod,asx=E[x∗|z,x,x,...]=g(z,x,x,...).ttt+1tt−1t−2tt−1t−21Theliteratureinthisareahasgrownatanimpressiverate,and,moresothaninanyotherarea,ithasbecomeimpossibletoprovidecomprehensivesurveysingeneraltextbookssuchasthisone.Fortunately,specializedvolumeshavebeenproducedthatcanfillthisneedatanylevel.Harvey(1990)hasbeeninwideuseforsometime.Amongthemanyotherbookswritteninthe1990s,threeveryusefulworksareEnders(1995),whichpresentsthebasicsoftimeseriesanalysisatanintroductorylevelwithseveralverydetailedapplications;Hamilton(1994),whichgivesarelativelytechnicalbutquitecomprehensivesurveyofthefield;andLutkepohl(1993),whichprovidesanextremelydetailedtreatmentofthetopicspresentedattheendofthischapter.Hamiltonalsosurveysanumberoftheapplicationsinthecontemporaryliterature.TworeferencesthatarefocusedonfinancialeconometricsareMills(1993)andTsay(2002).Therearealsoanumberofimportantreferencesthatareprimarilylimitedtoforecasting,includingDiebold(1998a,1998b)andGrangerandNewbold(1996).AsurveyofrecentresearchinmanyareasoftimeseriesanalysisisEngleandMcFadden(1994).Anextensive,fairlyadvancedtreatisethatanalyzesingreatdepthalltheissueswetouchoninthischapterisHendry(1995).Finally,Patterson(2000)surveysmostofthepracticalissuesintimeseriesandpresentsalargevarietyofusefulandverydetailedapplications.558\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables559Forexample,forecastsofpricesandincomeenterdemandequationsandcon-sumptionequations.(SeeExample18.1foraninfluentialapplication.)•Certaineconomicdecisionsareexplicitlydrivenbyahistoryofrelatedactivities.Forexample,energydemandbyindividualsisclearlyafunctionnotonlyofcurrentpricesandincome,butalsotheaccumulatedstocksofenergyusingcapital.Evenenergydemandinthemacroeconomybehavesinthisfashion—thestockofauto-mobilesanditsattendantdemandforgasolineisclearlydrivenbypastpricesofgasolineandautomobiles.Otherclassicexamplesarethedynamicrelationshipbe-tweeninvestmentdecisionsandpastappropriationdecisionsandtheconsumptionofaddictivegoodssuchascigarettesandtheaterperformances.Webeginwithageneraldiscussionofmodelscontaininglaggedvariables.InSec-tion19.2,weconsidersomemethodologicalissuesinthespecificationofdynamicregressions.InSections19.3and19.4,wedescribeageneraldynamicmodelthaten-compassessomeoftheextensionsandmoreformalmodelsfortime-seriesdatathatarepresentedinChapter20.Section19.5takesacloserlookatsomeofissuesinmodelspecification.Finally,Section19.6considerssystemsofdynamicequations.ThesearelargelyextensionsofthemodelsthatweexaminedattheendofChapter15.Buttheinterpretationisratherdifferenthere.Thischapterisgenerallynotaboutmethodsofestimation.OLSandGMMestimationareusuallyroutineinthiscontext.Sinceweareexaminingtimeseriesdata,conventionalassumptionsincludingergodicityandstation-aritywillbemadeattheoutset.Inparticular,inthegeneralframework,wewillassumethatthemultivariatestochasticprocess(yt,xt,εt)areastationaryandergodicprocess.Assuch,withoutfurtheranalysis,wewillinvokethetheoremsdiscussedinChapters5,12,16,and18thatsupportleastsquaresandGMMasappropriateestimatetechniquesinthiscontext.Inmostofwhatfollows,infact,inpracticalterms,thedynamicregres-sionmodelcanbetreatedasalinearregressionmodel,andestimatedbyconventionalmethods(e.g.,ordinaryleastsquaresorinstrumentalvariablesifεtisautocorrelated).Asnoted,wewillgenerallynotreturntotheissueofestimationandinferencethe-oryexceptwherenewresultsareneeded,suchasinthediscussionofnonstationaryprocesses.19.2DYNAMICREGRESSIONMODELSInsomesettings,economicagentsrespondnotonlytocurrentvaluesofindependentvariablesbuttopastvaluesaswell.Wheneffectspersistovertime,anappropriatemodelwillincludelaggedvariables.Example19.1illustratesafamiliarcase.Example19.1AStructuralModeloftheDemandforGasolineDriversdemandgasolinenotfordirectconsumptionbutasfuelforcarstoprovideasourceofenergyfortransportation.Percapitademandforgasolineinanyperiod,G/pop,isdeter-minedpartlybythecurrentprice,Pg,andpercapitaincome,Y/pop,whichinfluencehowintensivelytheexistingstockofgasolineusing“capital,”K,isusedandpartlybythesizeandcompositionofthestockofcarsandothervehicles.Thecapitalstockisdetermined,inturn,byincome,Y/pop;pricesoftheequipmentsuchasnewandusedcars,PncandPuc;thepriceofalternativemodesoftransportationsuchaspublictransportation,Ppt;andpastpricesofgasolineastheyinfluenceforecastsoffuturegasolineprices.Astructuralmodelof\nGreene-50240bookJune26,200221:55560CHAPTER19✦ModelswithLaggedVariablestheseeffectsmightappearasfollows:percapitademand:Gt/popt=α+βPgt+δYt/popt+γKt+ut,stockofvehicles:Kt=(1−)Kt−1+It,=depreciationrate,investmentinnewvehicles:It=θYt/popt+φEt[Pgt+1]+λ1Pnct+λ2Puct+λ3Ppttexpectedpriceofgasoline:Et[Pgt+1]=w0Pgt+w1Pgt−1+w2Pgt−2.Thecapitalstockisthesumofallpastinvestments,soitisevidentthatnotonlycurrentincomeandprices,butallpastvalues,playaroleindeterminingK.Whenincomeorthepriceofgasolinechanges,theimmediateeffectwillbetocausedriverstousetheirvehiclesmoreorlessintensively.But,overtime,vehiclesareaddedtothecapitalstock,andsomecarsarereplacedwithmoreorlessefficientones.Thesechangestakesometime,sothefullimpactofincomeandpricechangeswillnotbefeltforseveralperiods.Twoepisodesintherecenthistoryhaveshownthiseffectclearly.Forwelloveradecadefollowingthe1973oilshock,driversgraduallyreplacedtheirlarge,fuel-inefficientcarswithsmaller,less-fuel-intensivemodels.Inthelate1990sintheUnitedStates,thisprocesshasvisiblyworkedinreverse.AsAmericandrivershavebecomeaccustomedtosteadilyrisingincomesandsteadilyfallingrealgasolineprices,thedownsized,efficientcoupesandsedansofthe1980shaveyieldedthehighwaystoatideofever-larger,six-andeight-cylindersportutilityvehicles,whosesizeandpowercanreasonablybecharacterizedasastonishing.19.2.1LAGGEDEFFECTSINADYNAMICMODELThegeneralformofadynamicregressionmodelis∞yt=α+βixt−i+εt.(19-2)i=0Inthismodel,aone-timechangeinxatanypointintimewillaffectE[ys|xt,xt−1,...]ineveryperiodthereafter.Whenitisbelievedthatthedurationofthelaggedeffectsisextremelylong—forexample,intheanalysisofmonetarypolicy—infinitelagmodelsthathaveeffectsthatgraduallyfadeovertimearequitecommon.Butmodelsareoftenconstructedinwhichchangesinxceasetohaveanyinfluenceafterafairlysmallnumberofperiods.Weshallconsiderthesefinitelagmodelsfirst.Marginaleffectsinthestaticclassicalregressionmodelareone-timeevents.Theresponseofytoachangeinxisassumedtobeimmediateandtobecompleteattheendoftheperiodofmeasurement.Inadynamicmodel,thecounterparttoamarginaleffectistheeffectofaone-timechangeinxtontheequilibriumofyt.Ifthelevelofxthasbeenunchangedfrom,say,x¯formanyperiodspriortotimet,thentheequilibriumvalueofE[yy|xt,xt−1,...](assumingthatitexists)willbe∞∞y¯=α+βix¯=α+x¯βi,(19-3)i=0i=0wherex¯isthepermanentvalueofxt.Forthisvaluetobefinite,werequirethat∞βi<∞.(19-4)i=0Considertheeffectofaunitchangeinx¯occurringinperiods.Tofocusideas,considertheearlierexampleofdemandforgasolineandsupposethatxtistheunitprice.Priortotheoilshock,demandhadreachedanequilibriumconsistentwithaccumulatedhabits,\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables561D001i23DemandD1t1t1t2tt2TimeFIGURE19.1LaggedAdjustment.experiencewithstablerealprices,andtheaccumulatedstocksofvehicles.Nowsupposethatthepriceofgasoline,Pg,risespermanentlyfromPg¯toPg¯+1inperiods.ThepathtothenewequilibriummightappearasshowninFigure19.1.Theshort-runeffectistheonethatoccursinthesameperiodasthechangeinx.Thiseffectisβ0inthefigure.DEFINITION19.1ImpactMultiplierβ0=impactmultiplier=short-runmultiplier.DEFINITION19.2CumulatedEffectτTheaccumulatedeffectτperiodslaterofanimpulseattimetisβτ=i=0βi.InFigure19.1,weseethatthetotaleffectofapricechangeinperiodtafterthreeperiodshaveelapsedwillbeβ0+β1+β2+β3.ThedifferencebetweentheoldequilibriumD0andthenewoneD1isthesumoftheindividualperiodeffects.Thelong-runmultiplieristhistotaleffect.\nGreene-50240bookJune26,200221:55562CHAPTER19✦ModelswithLaggedVariablesDEFINITION19.3EquilibriumMultiplier∞β=i=0βi=equilibriummultiplier=long-runmultiplier.Sincethelagcoefficientsareregressioncoefficients,theirscaleisdeterminedbythescalesofthevariablesinthemodel.Assuch,itisoftenusefultodefinetheβilagweights:wi=∞(19-5)j=0βj∞sothati=0wi=1,andtorewritethemodelas∞yt=α+βwixt−i+εt.(19-6)i=0(NotetheequationfortheexpectedpriceinExample19.1.)Twousefulstatistics,basedonthelagweights,thatcharacterizetheperiodofadjustmenttoanewequilibriumare∗q∗∞2themedianlag=smallestqsuchthati=0wi≥0.5andthemeanlag=i=0iwi.19.2.2THELAGANDDIFFERENCEOPERATORSAconvenientdeviceformanipulatinglaggedvariablesisthelagoperator,Lxt=xt−1.SomebasicresultsareLa=aifaisaconstantandL(Lx)=L2x=x.Thus,ttt−2Lpx=x,Lq(Lpx)=Lp+qx=x,and(Lp+Lq)x=x+x.Byconvention,tt−pttt−p−qtt−pt−qL0x=1x=x.Arelatedoperationisthefirstdifference,tttxt=xt−xt−1.Obviously,xt=(1−L)xtandxt=xt−1+xt.Thesetwooperationscanbeusefullycombined,forexample,asin2x=(1−L)2x=(1−2L+L2)x=x−2x+x.ttttt−1t−2Notethat(1−L)2x=(1−L)(1−L)x=(1−L)(x−x)=(x−x)−(x−x).tttt−1tt−1t−1t−2Thedynamicregressionmodelcanbewritten∞y=α+βLix+ε=α+B(L)x+ε,titttti=02Ifthelagcoefficientsdonotallhavethesamesign,thentheseresultsmaynotbemeaningful.Insomecontexts,lagcoefficientswithdifferentsignsmaybetakenasanindicationthatthereisaflawinthespecificationofthemodel.\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables563whereB(L)isapolynomialinL,B(L)=β+βL+βL2+···.Apolynomialinthe012lagoperatorthatreappearsinmanycontextsis∞A(L)=1+aL+(aL)2+(aL)3+···=(aL)i.i=0If|a|<1,then1A(L)=.1−aLAdistributedlagmodelintheform∞y=α+βγiLix+εttti=0canbewritteny=α+β(1−γL)−1x+ε,tttif|γ|<1.Thisformiscalledthemoving-averageformordistributedlagform.Ifwemultiplythroughby(1−γL)andcollectterms,thenweobtaintheautoregressiveform,yt=α(1−γ)+βxt+γyt−1+(1−γL)εt.Inmoregeneralterms,considerthepthorderautoregressivemodel,yt=α+βxt+γ1yt−1+γ2yt−2+···+γpyt−p+εtwhichmaybewrittenC(L)yt=α+βxt+εtwhereC(L)=(1−γL−γL2−···−γLp).12pCanthisequationbe“inverted”sothatytiswrittenasafunctiononlyofcurrentandpastvaluesofxtandεt?Bysuccessivelysubstitutingthecorrespondingautoregressiveequationforyt−1inthatforyt,thenlikewiseforyt−2andsoon,itwouldappearso.However,itisalsoclearthattheresultingdistributedlagformwillhaveaninfinitenumberofcoefficients.Formally,theoperationjustdescribedamountstowritingy=[C(L)]−1(α+βx+ε)=A(L)(α+βx+ε).tttttItwillbeofinteresttobeabletosolvefortheelementsofA(L)(see,forexample,Section19.6.6).Bythisarrangement,itfollowsthatC(L)A(L)=1whereA(L)=(αL0−αL−αL2−···).012BycollectinglikepowersofLin(1−γL−γL2−···−γLp)(αL0+αL+αL2−···)=1,12p012\nGreene-50240bookJune26,200221:55564CHAPTER19✦ModelswithLaggedVariableswefindthatarecursivesolutionfortheαcoefficientsisL0:α=10L1:α−γα=0110L2:α−γα−γα=021120L3:α−γα−γα−γα=03122130L4:α−γα−γα−γα−γα=0(19-7)413223140...Lp:α−γα−γα−···−γα=0p1p−12p−2p0and,thereafter,Lq:α−γα−γα−···−γα=0.q1q−12q−2pq−pAfterasetofp−1startingvalues,theαcoefficientsobeythesamedifferenceequationasytdoesinthedynamicequation.Oneproblemremains.Forthegivensetofvalues,theprecedinggivesnoassurancethatthesolutionforαqdoesnotultimatelyexplode.Theequationsystemaboveisnotnecessarilystableforallvaluesofγj(thoughitcertainlyisforsome).Ifthesystemisstableinthissense,thenthepolynomialC(L)issaidtobeinvertible.ThenecessaryconditionsarepreciselythosediscussedinSection19.4.3,sowewilldefercompletionofthisdiscussionuntilthen.Finally,twousefulresultsareB(1)=β10+β11+β12+···=β=long-runmultiplier012and∞B(1)=[dB(L)/dL]=iβ.|L=1ii=0ItfollowsthatB(1)/B(1)=meanlag.19.2.3SPECIFICATIONSEARCHFORTHELAGLENGTHVariousprocedureshavebeensuggestedfordeterminingtheappropriatelaglengthinadynamicmodelsuchaspyt=α+βixt−i+εt.(19-8)i=0Onemustbecarefulaboutapurelysignificancebasedspecificationsearch.Letussupposethatthereisanappropriate,“true”valueofp>0thatweseek.Asimple-to-generalapproachtofindingtherightlaglengthwoulddepartfromamodelwithonlythecurrentvalueoftheindependentvariableintheregression,andadddeeperlagsuntilasimplettestsuggestedthatthelastoneaddedisstatisticallyinsignificant.Theproblemwithsuchanapproachisthatatanylevelatwhichthenumberofincludedlaggedvariablesislessthanp,theestimatorofthecoefficientvectorisbiasedandinconsistent.[Seetheomittedvariableformula(8-4).]Theasymptoticcovariancematrixisbiasedaswell,sostatisticalinferenceonthisbasisisunlikelytobesuccessful.Ageneral-to-simpleapproachwouldbeginfromamodelthatcontainsmorethanplaggedvalues—it\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables565isassumedthatthoughtheprecisevalueofpisunknown,theanalystcanpositamaintainedvaluethatshouldbelargerthanp.Leastsquaresorinstrumentalvariablesregressionofyonaconstantand(p+d)laggedvaluesofxconsistentlyestimatesθ=[α,β0,β1,...,βp,0,0,...].Sincemodelswithlaggedvaluesareoftenusedforforecasting,researchershavetendedtolookformeasuresthathaveproducedbetterresultsforassessing“outofsample”predictionproperties.TheadjustedR2[seeSection3.5.1]isonepossibility.OthersincludetheAkaike(1973)informationcriterion,AIC(p),ee2pAIC(p)=ln+(19-9)TTandSchwartz’scriterion,SC(p):pSC(p)=AIC(p)+(lnT−2).(19-10)T(SeeSection8.4.)IfsomemaximumPisknown,thenppcoefficients,stoppingwhenthetestrejectsthehypothesisthatthecoefficientsarejointlyzero.Eachoftheseapproacheshasitsflawsandvirtues.TheAkaikeinformationcriterionretainsapositiveprobabilityofleadingtooverfittingevenasT→∞.Incontrast,SC(p)hasbeenseentoleadtounderfittinginsomefinitesamplecases.Theydoavoid,however,theinferenceproblemsofsequentialestimators.ThesequentialFtestsrequiresuccessiverevisionofthesignificanceleveltobeappropriate,buttheydohaveastatisticalunderpinning.419.3SIMPLEDISTRIBUTEDLAGMODELSBeforeexaminingsomeverygeneralspecificationsofthedynamicregression,webrieflyconsidertwospecificframeworks—finitelagmodels,whichspecifyaparticularvalueofthelaglengthpin19-8,andaninfinitelagmodel,whichemergesfromasimplemodelofexpectations.19.3.1FINITEDISTRIBUTEDLAGMODELSAnunrestrictedfinitedistributedlagmodelwouldbespecifiedaspyt=α+βixt−i+εt.(19-11)i=0WeassumethatxtsatisfiestheconditionsdiscussedinSection5.2.Theassumptionthattherearenootherregressorsisjustaconvenience.Wealsoassumethatεtisdistributedwithmeanzeroandvarianceσ2.Ifthelaglengthpisknown,then(19-11)εisaclassicalregressionmodel.Asidefromquestionsaboutthepropertiesofthe3Forfurtherdiscussionandsomealternativemeasures,seeGewekeandMeese(1981),Amemiya(1985,pp.146–147),Diebold(1998a,pp.85–91),andJudgeetal.(1985,pp.353–355).4SeePaganoandHartley(1981)andTrivediandPagan(1979).\nGreene-50240bookJune26,200221:55566CHAPTER19✦ModelswithLaggedVariablesindependentvariables,theusualestimationresultsapply.5Buttheappropriatelengthofthelagisrarely,ifever,known,soonemustundertakeaspecificationsearch,withallitspitfalls.Worseyet,leastsquaresmayprovetoberatherineffectivebecause(1)timeseriesaresometimesfairlyshort,so(19-11)willconsumeanexcessivenumberofdegreesoffreedom;6(2)εwillusuallybeseriallycorrelated;and(3)multicollinearitytislikelytobequitesevere.Restrictedlagmodelswhichparameterizethelagcoefficientsasfunctionsofafewunderlyingparametersareapracticalapproachtotheproblemoffittingamodelwithlonglagsinarelativelyshorttimeseries.Anexampleisthepolynomialdistributedlag(PDL)[orAlmon(1965)laginreferencetoS.Almon,whofirstproposedthemethodineconometrics].Thepolynomialmodelassumesthatthetruedistributionoflagcoefficientscanbewellapproximatedbyalow-orderpolynomial,β=α+αi+αi2+···+αiq,i=0,1,...,p>q.(19-12)i012pAftersubstituting(19-12)in(19-11)andcollectingterms,weobtainpppy=γ+αi0x+αi1x+···+αiqx+εt0t−i1t−iqt−iti=0i=0i=0(19-13)=γ+α0z0t+α1z1t+···+αqzqt+εt.Eachzjtisalinearcombinationofthecurrentandplaggedvaluesofxt.Withtheassumptionofstrictexogeneityofxt,γand(α0,α1,...,αq)canbeestimatedbyordinaryorgeneralizedleastsquares.Theparametersoftheregressionmodel,βiandasymptoticstandarderrorsfortheestimatorscanthenbeobtainedusingthedeltamethod(seeSectionD.2.7).Thepolynomiallagmodelandothertightlystructuredfinitelagmodelsareonlyinfrequentlyusedincontemporaryapplications.Theyhavethevirtueofsimplicity,al-thoughmodernsoftwarehasmadethisqualityamodestvirtue.Themajordrawbackisthattheyimposestrongrestrictionsonthefunctionalformofthemodelandtherebyofteninduceautocorrelationthatisessentiallyanartifactofthemissingvariablesandrestrictivefunctionalformintheequation.Theyremainusefultoolsinsomeforecastingsettingsandanalysisofmarkets,asinExample19.3,butinrecentworkinmacroeco-nomicandfinancialmodeling,wheremostofthissortofanalysistakesplace,theavail-abilityofampledatahasmaderestrictivespecificationssuchasthePDLlessattractivethanothertools.19.3.2ANINFINITELAGMODEL:THEGEOMETRICLAGMODELTherearecasesinwhichthedistributedlagmodelstheaccumulationofinformation.Theformationofexpectationsisanexample.Intheseinstances,intuitionsuggeststhat5Thequestionofwhethertheregressorsarewellbehavedornotbecomesparticularlypertinentinthissetting,especiallyifoneormoreofthemhappentobelaggedvaluesofthedependentvariable.Inwhatfollows,weshallassumethattheGrenanderconditionsdiscussedinSection5.2.1aremet.Wethusassumethattheusualasymptoticresultsfortheclassicalorgeneralizedregressionmodelwillhold.6Evenwhenthetimeseriesislong,themodelmaybeproblematic—inthisinstance,theassumptionthatthesamemodelcanbeused,withoutstructuralchangethroughtheentiretimespanbecomesincreasinglysuspectthelongerthetimeseriesis.SeeSections7.4and7.7foranalysisofthisissue.\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables567themostrecentpastwillreceivethegreatestweightandthattheinfluenceofpastobservationswillfadeuniformlywiththepassageoftime.Thegeometriclagmodelisoftenusedforthesesettings.Thegeneralformofthemodelis∞y=α+β(1−λ)λix+ε,0<λ<1,tt−iti=1(19-14)=α+βB(L)xt+εt,where1−λB(L)=(1−λ)(1+λL+λ2L2+λ3L3+···)=.1−λLThelagcoefficientsareβ=β(1−λ)λi.Themodelincorporatesinfinitelags,butitas-isignsarbitrarilysmallweightstothedistantpast.Thelagweightsdeclinegeometrically;w=(1−λ)λi,0≤w<1.iiThemeanlagisB(1)λw¯==.B(1)1−λp∗−1Themedianlagisp∗suchthatw=0.5.Wecansolveforp∗byusingtheresulti=0ipp+11−λλi=.1−λi=0Thus,ln0.5p∗=−1.lnλ∞iTheimpactmultiplierisβ(1−λ).Thelongrunmultiplierisβ(1−λ)λ=β.Thei=0equilibriumvalueofytwouldbefoundbyfixingxtatx¯andεtatzeroin(19-14),whichproducesy¯=α+βx¯.Thegeometriclagmodelcanbemotivatedwithaneconomicmodelofexpectations.Webeginwitharegressioninanexpectationsvariablesuchasanexpectedfuturepricebasedoninformationavailableattimet,x∗,andperhapsasecondregressor,w,t+1|tty=α+βx∗+δw+ε,tt+1|tttandamechanismfortheformationoftheexpectation,x∗=λx∗+(1−λ)x=λLx∗+(1−λ)x.(19-15)t+1|tt|t−1tt+1|ttThecurrentlyformedexpectationisaweightedaverageoftheexpectationintheprevi-ousperiodandthemostrecentobservation.Theparameterλistheadjustmentcoeffi-cient.Ifλequals1,thenthecurrentdatumisignoredandexpectationsareneverrevised.Avalueofzerocharacterizesastrictpragmatistwhoforgetsthepastimmediately.Theexpectationvariablecanbewrittenas1−λx∗=x=(1−λ)[x+λx+λ2x+···].(19-16)t+1|tttt−1t−21−λL\nGreene-50240bookJune26,200221:55568CHAPTER19✦ModelswithLaggedVariablesInserting(19-16)into(19-15)producesthegeometricdistributedlagmodel,y=α+β(1−λ)[x+λx+λ2x+···]+δw+ε.ttt−1t−2ttThegeometriclagmodelcanbeestimatedbynonlinearleastsquares.Rewriteitasyt=α+γzt(λ)+δwt+εt,γ=β(1−λ).(19-17)Theconstructedvariablezt(λ)obeystherecursionzt(λ)=xt+λzt−1(λ).Forthefirstobservation,weusez(λ)=x∗=x/(1−λ).Ifthesampleismoderatelylong,then11|01assumingthatxtwasinlong-runequilibrium,althoughitisanapproximation,willnotundulyaffecttheresults.Onecanthenscanovertherangeofλfromzerotoonetolocatethevaluethatminimizesthesumofsquares.Oncetheminimumislocated,anestimateoftheasymptoticcovariancematrixoftheestimatorsof(α,γ,δ,λ)canbefoundusing(9-9)andTheorem9.2.Fortheregressionfunctionh(data|α,γ,δ,λ),x0=1,x0=z(λ),tt1t2tandx0=w.Thederivativewithrespecttoλcanbecomputedbyusingtherecursiont3tdt(λ)=∂zt(λ)/∂λ=zt−1(λ)+λ∂zt−1(λ)/∂λ.Ifz1=x1/(1−λ),thend1(λ)=z1/(1−λ).Then,x0=d(λ).Finally,weestimateβfromtherelationshipβ=γ/(1−λ)anduset4tthedeltamethodtoestimatetheasymptoticstandarderror.Forpurposesofestimatinglong-andshort-runelasticities,researchersoftenuseadifferentformofthegeometriclagmodel.Thepartialadjustmentmodeldescribesthedesiredlevelofyt,y∗=α+βx+δw+ε,ttttandanadjustmentequation,y−y=(1−λ)(y∗−y).tt−1tt−1Ifwesolvethesecondequationforyandinsertthefirstexpressionfory∗,thenwettobtainyt=α(1−λ)+β(1−λ)xt+δ(1−λ)wt+λyt−1+(1−λ)εt=α+βx+δw+λy+ε.ttt−1tThisformulationoffersanumberofsignificantpracticaladvantages.Itisintrinsicallylinearintheparameters(unrestricted),anditsdisturbanceisnonautocorrelatedifεtwastobeginwith.Assuch,theparametersofthismodelcanbeestimatedconsistentlyandefficientlybyordinaryleastsquares.Inthisrevisedformulation,theshort-runmultipliersforxandwareβandδ.Thelong-runeffectsareβ=β/(1−λ)andttδ=δ/(1−λ).Withthevariablesinlogs,theseeffectsaretheshort-andlong-runelasticities.Example19.2ExpectationsAugmentedPhillipsCurveInExample12.3,weestimatedanexpectationsaugmentedPhillipscurveoftheform∗pt−E[pt|t−1]=β[ut−u]+εt.Thismodelassumesaparticularlysimplemodelofexpectations,E[pt|t−1]=pt−1.Theleastsquaresresultsforthisequationwerept−pt−1=0.49189−0.090136ut+et2(0.7405)(0.1257)R=0.002561,T=201.\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables569ResidualsumofSquares25002000S(L)15001000.0.2.4.6.81.0LAMBDA_IFIGURE19.2SumsofSquaresforPhillipsCurveEstimates.Theimpliedestimateofthenaturalrateofunemploymentis−(0.49189/−0.090136)orabout5.46percent.Supposeweallowexpectationstobeformulatedlesspragmaticallywiththeexpectationsmodelin(19-15).Forthissetting,thiswouldbeE[pt|t−1]=λE[pt−1|t−2]+(1−λ)pt−1.Thestrictpragmatisthasλ=0.0.Usingthemethodsetoutearlier,wewouldcomputethisfordifferentvaluesofλ,recomputethedependentvariableintheregression,andlocatethevalueofλwhichproducesthelowestsumofsquares.Figure19.2showsthesumofsquaresforthevaluesofλrangingfrom0.0to1.0.Theminimumvalueofthesumofsquaresoccursatλ=0.66.Theleastsquaresregressionresultsarept−pt−1=1.69453−0.30427ut+et(0.6617)(0.11125)T=201.Theestimatedstandarderrorsarecomputedusingthemethoddescribedearlierforthenonlinearregression.Theextravariabledescribedintheparagraphafter(19-17)accountsfortheestimatedλ.Theestimatedasymptoticcovariancematrixisthencomputedusing(ee/201)[WW]−1wherew=1,w=uandw=∂p/∂λ.Theestimatedstandarderror12t3t−1forλis0.04610.Sincethisishighlystatisticallysignificantlydifferentfromzero(t=14.315),wewouldrejectthesimplemodel.Finally,theimpliedestimateofthenaturalrateofunemploy-mentis−(−1.69453/.30427)orabout5.57percent.Theestimatedasymptoticcovarianceoftheslopeandconstanttermis−0.0720293,so,usingthisvalueandtheestimatedstandarderrorsgivenaboveandthedeltamethod,weobtainanestimatedstandarderrorforthises-timateof0.5467.Thus,aconfidenceintervalforthenaturalrateofunemploymentbasedontheseresultswouldbe(4.49%,6.64%)whichisinlinewithourpriorexpectations.Therearetwothingstonoteabouttheseresults.First,sincethedependentvariablesaredifferent,wecannotcomparetheR2softhemodelswithλ=0.00andλ=0.66.But,thesumofsquaresforthetwomodelscanbecompared;theyare1592.32and1112.89,sothesecondmodel\nGreene-50240bookJune26,200221:55570CHAPTER19✦ModelswithLaggedVariablesTABLE19.1EstimatedDistributedLagModelsExpectationsPartialAdjustmentCoefficientUnrestrictedEstimatedDerivedEstimatedDerivedConstant−18.165−18.080−5.133−14.102LnPnc0.190−0.0592−0.139−0.382LnPuc0.08020.3700.1260.346LnPpt−0.07540.1160.0510.140Trend−0.0336−0.0399−0.0106−0.029LnPg−0.209—−0.171∗−0.118−0.118LnPg[−1]−0.133—−0.113—−0.075LnPg[−2]0.0820—−0.074—−0.048LnPg[−3]0.0026—−0.049—−0.030LnPg[−4]−0.0585—−0.032—−0.019LnPg[−5]0.0455—−0.021—−0.012Lnincome0.785—0.877∗0.7720.772LnY[−1]−0.0138—0.298—0.491LnY[−2]0.696—0.101—0.312LnY[−3]0.0876—0.034—0.199LnY[−4]0.257—0.012—0.126LnY[−5]0.779—0.004—0.080Zt(priceG)—−0.171—0.051Zt(income)—0.877—LnG/pop[−1]——0.636β—−0.502—γ2.580—λ—0.660.636ee0.0016495090.00984092860.01250433T313635∗Estimateddirectly.fitsfarbetter.Oneofthepayoffsisthemuchnarrowerconfidenceintervalforthenaturalrate.Thecounterparttotheonegivenabovewhenλ=0.00is(1.13%,9.79%).Nodoubtthemodelcouldbeimprovedstillfurtherbyexpandingtheequation.(Thisisconsideredintheexercises.)Example19.3PriceandIncomeElasticitiesofDemandforGasolineWehaveextendedthegasolinedemandequationestimatedinExamples2.3,4.4,and7.6toallowfordynamiceffects.Table19.1presentsestimatesofthreedistributedlagmodelsforgasolineconsumption.Theunrestrictedmodelallows5yearsofadjustmentinthepriceandincomeeffects.Theexpectationsmodelincludesthesamedistributedlag(λ)onpriceandincomebutdifferentlong-runmultipliers(βPgandβI).[Note,forthisformulation,thattheextraregressorusedincomputingtheasymptoticcovariancematrixisdt(λ)=βPgdprice(λ)+βIdincome(λ).]Finally,thepartialadjustmentmodelimplieslaggedeffectsforallthevariablesinthemodel.Tofacilitatecomparison,theconstantandthefirstfourslopecoefficientsinthepartialadjustmentmodelhavebeendividedbytheestimateof(1−λ).Theimpliedlong-andshort-runpriceandincomeelasticitiesareshowninTable19.2.Theancillaryelasticitiesforthepricesofnewandusedcarsandforpublictransportationvarysurprisinglywidelyacrossthemodels,butthepriceandincomeelasticitiesarequitestable.Asmightbeexpected,thebestfittothedataisprovidedbytheunrestrictedlagmodel.Thesumofsquaresisfarlowerforthisformthanfortheothertwo.Adirectcomparisonisdifficult,becausethemodelsarenotnestedandbecausetheyarebasedondifferentnumbersofobservations.Asanapproximation,wecancomputethesumofsquaredresidualsfor\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables571TABLE19.2EstimatedElasticitiesShortRunLongRunPriceIncomePriceIncomeUnrestrictedmodel−0.2090.785−0.2702.593Expectationsmodel−0.1700.901−0.5022.580Partialadjustmentmodel−0.1180.772−0.3242.118theestimateddistributedlagmodel,usingonlythe31observationsusedtocomputetheunrestrictedmodel.Thissumofsquaresis0.009551995087.AnFstatisticbasedonthissumofsquareswouldbe(0.009551995−0.0016495090)/9F[17−8,31−17]==7.4522.0.0016495090/14The95percentcriticalvalueforthisdistributionis2.646,sotherestrictionsofthedistributedlagmodelwouldberejected.Thesamecomputation(samedegreesoffreedom)forthepartialadjustmentmodelproducesasumofsquaresof0.01215449andanFof9.68.Onceagain,theseareonlyroughindicators,buttheydosuggestthattherestrictionsofthedistributedlagmodelsareinappropriateinthecontextofthemodelwithfivelaggedvaluesforpriceandincome.19.4AUTOREGRESSIVEDISTRIBUTEDLAGMODELSBoththefinitelagmodelsandthegeometriclagmodelimposestrong,possiblyin-correctrestrictionsonthelaggedresponseofthedependentvariabletochangesinanindependentvariable.Averygeneralcompromisethatalsoprovidesausefulplat-formforstudyinganumberofinterestingmethodologicalissuesistheautoregressivedistributedlag(ARDL)model,pryt=µ+γiyt−i+βjxt−j+δwt+εt,(19-18)i=1j=0inwhichεtisassumedtobeseriallyuncorrelatedandhomoscedastic(wewillrelaxboththeseassumptionsinChapter20).WecanwritethismorecompactlyasC(L)yt=µ+B(L)xt+δwt+εtbydefiningpolynomialsinthelagoperator,C(L)=1−γL−γL2−···−γLp12pandB(L)=β+βL+βL2+···+βLr.012rThemodelinthisformisdenotedARDL(p,r)toindicatetheordersofthetwopoly-nomialsinL.Thepartialadjustmentmodelestimatedintheprevioussectionisthespecialcaseinwhichpequals1andrequals0.Anumberofotherspecialcasesarealsointeresting,includingthefamiliarmodelofautocorrelation(p=1,r=1,β1=−γ1β0),theclassicalregressionmodel(p=0,r=0),andsoon.\nGreene-50240bookJune26,200221:55572CHAPTER19✦ModelswithLaggedVariables19.4.1ESTIMATIONOFTHEARDLMODELSaveforthepresenceofthestochasticright-hand-sidevariables,theARDLisalinearmodelwithaclassicaldisturbance.Assuch,ordinaryleastsquaresistheefficientesti-mator.Thelaggeddependentvariabledoespresentacomplication,butweconsideredthisinSection5.4.Absentanyobviousviolationsoftheassumptionsthere,leastsquarescontinuestobetheestimatorofchoice.Conventionaltestingproceduresare,asbefore,asymptoticallyvalidaswell.Thus,fortestinglinearrestrictions,theWaldstatisticcanbeused,althoughtheFstatisticisgenerallypreferableinfinitesamplesbecauseofitsmoreconservativecriticalvalues.Onesubtlecomplicationinthemodelhasattractedalargeamountofattentionintherecentliterature.IfC(1)=0,thenthemodelisactuallyinestimable.Thisfactisevidentinthedistributedlagform,whichincludesatermµ/C(1).Iftheequivalentconditioniγi=1holds,thenthestochasticdifferenceequationisunstableandahostofotherproblemsariseaswell.Thisimplicationsuggeststhatonemightbeinterestedintestingthisspecificationasahypothesisinthecontextofthemodel.Thisrestrictionmightseemtobeasimplelinearconstraintonthealternative(unrestricted)modelin(19-18).Underthenullhypothesis,however,theconventionalteststatisticsdonothavethefamiliardistributions.Theformalderivationiscomplicated[intheextreme,seeDickeyandFuller(1979)forexample],butintuitionshouldsuggestthereason.Underthenullhypothesis,thedifferenceequationisexplosive,soourassumptionsaboutwellbehaveddatacannotbemet.ConsiderasimpleARDL(1,0)exampleandsimplifyitevenfurtherwithB(L)=0.Then,yt=µ+γyt−1+εt.Ifγequals1,thenyt=µ+yt−1+εt.Assumingwestartthetimeseriesattimet=1,yt=tµ+sεs=tµ+vt.Theconditionalmeaninthisrandomwalkwithdriftmodelisincreasingwithoutlimit,sotheunconditionalmeandoesnotexist.Theconditionalmeanofthedisturbance,vt,iszero,butitsconditionalvarianceistσ2,whichshowsapeculiartypeofheteroscedasticity.Considerleastsquaresestimationofµwithm=(ty)/(tt),wheret=[1,2,3,...,T].ThenE[m]=µ+E[(tt)−1(tv)]=µ,but2T34σtO(T)1Var[m]=t=1==O.2[O(T3)]2T2Tt2t=1So,thevarianceofthisestimatorisanorderofmagnitudesmallerthanweareusedtoseeinginregressionmodels.Notonlyismmeansquareconsistent,itis“superconsis-tent.”Assuch,withoutdoingaformalderivation,weconcludethatthereissomething“unusual”aboutthisestimatorandthatthe√“usual”testingprocedureswhosedistribu-tionsbuildonthedistributionofT(m−µ)willnotbeappropriate;thevarianceofthisnormalizedstatisticconvergestozero.\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables573Thisresultdoesnotmeanthatthehypothesisγ=1isnottestableinthismodel.Infact,theappropriateteststatisticistheconventionalonethatwehavecomputedforcomparabletestsbefore.Buttheappropriatecriticalvaluesagainstwhichtomeasurethosestatisticsarequitedifferent.WewillreturntothisissueinourdiscussionoftheDickey–FullertestinSection20.3.4.19.4.2COMPUTATIONOFTHELAGWEIGHTSINTHEARDLMODELThedistributedlagformoftheARDLmodelisµB(L)11yt=+xt+δwt+εtC(L)C(L)C(L)C(L)µ∞∞∞=+αjxt−j+δθlwt−l+θlεt−l.1−γ1−···−γpj=0l=0l=0Thismodelprovidesamethodofapproximatingaverygenerallagstructure.InJorgenson’s(1966)study,inwhichhelabeledthismodelarationallagmodel,hedemon-stratedthatessentiallyanydesiredshapeforthelagdistributioncouldbeproducedwithrelativelyfewparameters.7Thelagcoefficientsonxt,xt−1,...intheARDLmodelaretheindividualtermsintheratioofpolynomialsthatappearinthedistributedlagform.WedenotetheseascoefficientsB(L)α,α,α,...=thecoefficienton1,L,L2,...in.(19-19)012C(L)Aconvenientwaytocomputethesecoefficientsistowrite(19-19)asA(L)C(L)=B(L).ThenwecanjustequatecoefficientsonthepowersofL.Example19.4demonstratestheprocedure.∞Thelong-runeffectinarationallagmodelisi=0αi.Thisresultiseasytocomputesinceitissimply∞B(1)αi=.C(1)i=0Astandarderrorforthelong-runeffectcanbecomputedusingthedeltamethod.19.4.3STABILITYOFADYNAMICEQUATIONInthegeometriclagmodel,wefoundthatastabilitycondition|λ|<1wasnecessaryforthemodeltobewellbehaved.Similarly,intheAR(1)model,theautocorrelationparameterρmustberestrictedto|ρ|<1forthesamereason.Thedynamicmodelin(19-18)mustalsoberestricted,butinwaysthatarelessobvious.Consideronceagainthequestionofwhetherthereexistsanequilibriumvalueofyt.In(19-18),supposethatxtisfixedatsomevaluex¯,wtisfixedatzero,andthedistur-bancesεtarefixedattheirexpectationofzero.Wouldytconvergetoanequilibrium?7Alongliterature,highlightedbyGriliches(1967),Dhrymes(1971),Nerlove(1972),Maddala(1977a),andHarvey(1990),describesestimationofmodelsofthissort.\nGreene-50240bookJune26,200221:55574CHAPTER19✦ModelswithLaggedVariablesTherelevantdynamicequationisyt=α¯+γ1yt−1+γ2yt−2+···+γpyt−p,whereα¯=µ+B(1)x¯.Ifytconvergestoanequilibrium,then,thatequilibriumisµ+B(1)x¯α¯y¯==.C(1)C(1)Stabilityofadynamicequationhingesonthecharacteristicequationfortheautore-gressivepartofthemodel.Therootsofthecharacteristicequation,C(z)=1−γz−γz2−···−γzp=0,(19-20)12pmustbegreaterthanoneinabsolutevalueforthemodeltobestable.Totakeasimpleexample,thecharacteristicequationforthefirst-ordermodelswehaveexaminedthusfarisC(z)=1−λz=0.Thesinglerootofthisequationisz=1/λ,whichisgreaterthanoneinabsolutevalueif|λ|islessthanone.Therootsofamoregeneralcharacteristicequationarethereciprocalsofthecharacteristicrootsofthematrixγ1γ2γ3...γp−1γp100...00010...00C=.(19-21)001...00...000...10Sincethematrixisasymmetric,itsrootsmayincludecomplexpairs.Thereciprocalofthecomplexnumbera+biisa/M−(b/M)i,whereM=a2+b2andi2=−1.WethusrequirethatMbelessthan1.Thecaseofz=1,theunitrootcase,isoftenofspecialinterest.IfoneoftheprootsofC(z)=0is1,thenitfollowsthati=1γi=1.ThisassumptionwouldappeartobeasimplehypothesistotestintheframeworkoftheARDLmodel.Instead,wefindtheexplosivecasethatweexaminedinSection19.4.1,sothehypothesisismorecomplicatedthanitfirstappears.Toreiterate,underthenullhypothesisthatC(1)=0,itisnotpossibleforthestandardFstatistictohaveacentralFdistributionbecauseofthebehaviorofthevariablesinthemodel.Wewillreturntothiscaseshortly.Theunivariateautoregression,yt=µ+γ1yt−1+γ2yt−2+···+γpyt−p+εt,canbeaugmentedwiththep−1equationsyt−1=yt−1,yt−2=yt−2,andsoontogiveavectorautoregression,VAR(tobeconsideredinthenextsection):yt=µ+Cyt−1+εt,\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables575whereyhaspelementsε=(ε,0,...)andµ=(µ,0,0,...).Sinceitwillultimatelytttnotberelevanttothesolution,wewillletεtequalitsexpectedvalueofzero.Now,bysuccessivesubstitution,weobtainy=µ+Cµ+C2µ+···,twhichmayormaynotconverge.WriteCinthespectralformC=PQ,whereQP=Iandisadiagonalmatrixofthecharacteristicroots.(NotethatthecharacteristicrootsinandvectorsinPandQmaybecomplex.)Wethenobtain∞y=PiQµ.(19-22)ti=0IfalltherootsofCarelessthanoneinabsolutevalue,thenthisvectorwillconvergetotheequilibriumy=(I−C)−1µ.∞NonexplosionofthepowersoftherootsofCisequivalentto|λp|<1,or|1/λp|>1,whichwasouroriginalrequirement.NotefinallythatsinceµisamultipleofthefirstcolumnofI,itmustbethecasethateachelementinthefirstcolumnof(I−C)−1ispthesame.Atequilibrium,therefore,wemusthaveyt=yt−1=···=y∞.Example19.4ARationalLagModelAppendixTableF5.1listsquarterlydataonanumberofmacroeconomicvariablesincludingconsumptionanddisposableincomefortheU.S.economyfortheyears1950to2000,atotalof204quarters.Themodelct=δ+β0yt+β1yt−1+β2yt−2+β3yt−3+γ1ct−1+γ2ct−2+γ3ct−3+εtisestimatedusingthelogarithmsofconsumptionanddisposableincome,denotedctandyt.OrdinaryleastsquaresestimatesoftheparametersoftheARDL(3,3)modelarect=0.7233ct−1+0.3914ct−2−0.2337ct−3+0.5651yt−0.3909yt−1−0.2379yt−2+0.902yt−3+et.(Afullsetofquarterlydummyvariablesisomitted.)TheDurbin–Watsonstatisticis1.78957,soremainingautocorrelationseemsunlikelytobeaconsideration.Thelagcoefficientsaregivenbytheequality22323(α0+α1L+α2L+···)(1−γ1L−γ2L−γ3L)=(β0+β1L+β2L+β3L).NotethatA(L)isaninfinitepolynomial.Thelagcoefficientsare1:α0=β0(whichwillalwaysbethecase),L1:−αγ+α=βorα=β+αγ,01111101L2:−αγ−αγ+α=βorα=β+αγ+αγ,021122220211L3:−αγ−αγ−αγ+α=βorα=β+αγ+αγ+αγ,0312213333031221L4:−αγ−αγ−αγ+α=0orα=γα+γα+γα,13223144132231Lj:−αγ−αγ−αγ+α=0orα=γα+γα+γα,j=5,6,...j−33j−22j−11jj1j−12j−23j−3andsoon.Fromthefifthtermonward,theseriesoflagcoefficientsfollowstherecursionαj=γ1αj−1+γ2αj−2+γ3αj−3,whichisthesameastheautoregressivepartoftheARDLmodel.Theseriesoflagweightsfollowsthesamedifferenceequationasthecurrentand\nGreene-50240bookJune26,200221:55576CHAPTER19✦ModelswithLaggedVariablesTABLE19.3LagCoefficientsinaRationalLagModelLag01234567ARDL.565.018−.004.062.039.054.039.041Unrestricted.954−.090−.063.100−.024.057−.112.236laggedvaluesofytafterrinitialvalues,whereristheorderoftheDLpartoftheARDLmodel.ThethreecharacteristicrootsoftheCmatrixare0.8631,−0.5949,and0.4551.Sinceallarelessthanone,weconcludethatthestochasticdifferenceequationisstable.ThefirstsevenlagcoefficientsoftheestimatedARDLmodelarelistedinTable19.3withthefirstsevencoefficientsinanunrestrictedlagmodel.ThecoefficientsfromtheARDLmodelonlyvaguelyresemblethosefromtheunrestrictedmodel,buttheerraticswingsofthelatterarepreventedbythesmoothequationfromthedistributedlagmodel.Theestimatedlong-termeffects(withstandarderrorsinparentheses)fromthetwomodelsare1.0634(0.00791)fromtheARDLmodeland1.0570(0.002135)fromtheunrestrictedmodel.Surprisingly,inviewofthelargeandhighlysignificantestimatedcoefficients,thelaggedeffectsfalloffessentiallytozeroaftertheinitialimpact.19.4.4FORECASTINGConsider,first,aone-period-aheadforecastofytintheARDL(p,r)model.Itwillbeconvenienttocollectthetermsinµ,xt,wt,andsooninasingleterm,rµt=µ+βjxt−j+δwt.j=0Now,theARDLmodelisjustyt=µt+γ1yt−1+···+γpyt−p+εt.ConditionedonthefullsetofinformationavailableuptotimeTandonforecastsoftheexogenousvariables,theone-period-aheadforecastofytwouldbeyˆT+1|T=µˆT+1|T+γ1yT+···+γpyT−p+1+εˆT+1|T.Toformapredictioninterval,wewillbeinterestedinthevarianceoftheforecasterror,eT+1|T=yˆT+1|T−yT+1.Thiserrorwillarisefromthreesources.First,inforecastingµt,therewillbetwosourcesoferror.Theparameters,µ,δ,andβ0,...,βrwillhavebeenestimated,soµˆT+1|TwilldifferfromµT+1becauseofthesamplingvariationintheseestimators.Second,iftheexogenousvariables,xT+1andwT+1havebeenforecasted,thentotheextentthattheseforecastsarethemselvesimperfect,yetanothersourceoferrortotheforecastwillresult.Finally,althoughwewillforecastεT+1withitsexpectationofzero,wewouldnotassumethattheactualrealizationwillbezero,sothisstepwillbeathirdsourceoferror.Inprinciple,anestimateoftheforecastvariance,Var[eT+1|T],wouldaccountforallthreesourcesoferror.Inpractice,handlingthesecondoftheseerrorsislargelyintractablewhilethefirstismerelyextremelydifficult.[SeeHarvey(1990)andHamilton(1994,especiallySection11.7)forusefuldiscussion.McCullough(1996)presentsresultsthatsuggestthat“intractable”maybetoopessimistic.]Forthemoment,wewillconcentrateonthethirdsourceandreturntotheotherissuesbrieflyattheendofthesection.\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables577IgnoringforthemomentthevariationinµˆT+1|T—thatis,assumingthattheparam-etersareknownandtheexogenousvariablesareforecastedperfectly—thevarianceoftheforecasterrorwillbesimplyVar[e|x,w,µ,β,δ,y,...]=Var[ε]=σ2,T+1|TT+1T+1TT+1soatleastwithintheseassumptions,formingtheforecastandcomputingtheforecastvariancearestraightforward.Also,atthisfirststep,giventhedatausedfortheforecast,thefirstpartofthevarianceisalsotractable.LetzT+1=[1,xT+1,xT,...,xT−r+1,wT,yT,yT−1,...,yT−p+1],andletθˆdenotethefullestimatedparametervector.ThenwewoulduseEst.Var[e|z]=s2+zEst.Asy.Var[θˆ]z.T+1|TT+1T+1T+1Now,considerforecastingfurtheroutbeyondthesampleperiod:yˆT+2|T=µˆT+2|T+γ1yˆT+1|T+···+γpyT−p+2+εˆT+2|T.NotethatforperiodT+1,theforecastedyT+1isused.MakingthesubstitutionforyˆT+1|T,wehaveyˆT+2|T=µˆT+2|T+γ1(µˆT+1|T+γ1yT+···+γpyT−p+1+εˆT+1|T)+···+γpyT−p+2+εˆT+2|Tand,likewise,forsubsequentperiods.Ourmethodwillbesimplifiedconsiderablyifweusethedeviceweconstructedintheprevioussection.Forthefirstforecastperiod,writetheforecastwiththepreviousplaggedvaluesasyˆT+1|TµˆT+1|Tγ1γ2···γpyTεˆT+1|TyT010···0yT−10y=0+y+0.T−101···0T−2........0···10....Thecoefficientmatrixontheright-handsideisC,whichwedefinedin(19-21).Tomaintainthethreadofthediscussion,wewillcontinuetousethenotationµˆT+1|Tfortheforecastofthedeterministicpartofthemodel,althoughforthepresent,weareassumingthatthisvalue,aswellasC,isknownwithcertainty.Withthismodification,then,ourforecastisthetopelementofthevectorofforecasts,yˆT+1|T=µˆT+1|T+CyT+εˆT+1|T.Sinceweareassumingthateverythingontheright-handsideisknownexcepttheperiodT+1disturbance,thecovariancematrixforthisp+1vectorisσ20···.E[(yˆ−y)(yˆ−y)]=00..,T+1|TT+1T+1|TT+1.....···.andtheforecastvarianceforyˆisjusttheupperleftelement,σ2.T+1|TNow,extendthisnotationtoforecastingouttoperiodsT+2,T+3,andsoon:yˆT+2|T=µˆT+2|T+CyˆT+1|T+εˆT+2|T=µˆ+Cµˆ+C2y+εˆ+Cεˆ.T+2|TT+1|TTT+2|TT+1|T\nGreene-50240bookJune26,200221:55578CHAPTER19✦ModelswithLaggedVariablesOnceagain,theonlyunknownsarethedisturbances,sotheforecastvarianceforthistwo-period-aheadforecastedvectorisσ20···σ20···..Var[εˆ+Cεˆ]=00..+C00..C.T+2|TT+1|T.........···..···.Thus,theforecastvarianceforthetwo-step-aheadforecastisσ2[1+(1)],where11(1)isthe1,1elementof(1)=CjjC,wherej=[σ,0,...,0].Byextendingthis11devicetoaforecastFperiodsbeyondthesampleperiod,weobtainFFyˆ=Cf−1µˆ+CFy+Cf−1εˆ.(19-23)T+F|TT+F−(f−1)|TTt+F−(f−1)|Tf=1f=1Thisequationshowshowtocomputetheforecasts,whichisreasonablysimple.Wealsoobtainourexpressionfortheconditionalforecastvariance,ConditionalVar[yˆ]=σ2[1+(1)+(2)+···+(F−1)],(19-24)T+F|T111111where(i)=CijjCi.ThegeneralformoftheF-period-aheadforecastshowshowtheforecastswillbehaveastheforecastperiodextendsfurtheroutbeyondthesampleperiod.Iftheequationisstable—thatis,ifallrootsofthematrixCarelessthanoneinabsolutevalue—thenCFwillconvergetozero,andsincetheforecasteddisturbancesarezero,theforecastwillbedominatedbythesuminthefirstterm.Ifwesuppose,inaddition,thattheforecastsoftheexogenousvariablesarejusttheperiodT+1forecastedvaluesandnotrevised,then,aswefoundattheendoftheprevioussection,theforecastwillultimatelyconvergetolimyˆ|µˆ=[I−C]−1µˆ.T+F|TT+1|TT+1|TF→∞Toaccountfullyforallsourcesofvariationintheforecasts,wewouldhavetorevisetheforecastvariancetoincludethevariationintheforecastsoftheexogenousvariablesandthevariationintheparameterestimates.Asnoted,thefirstoftheseislikelytobeintractable.Forthesecond,thisrevisionwillbeextremelydifficult,themoresowhenwealsoaccountforthematrixC,aswellasthevectorµ,beingbuiltupfromtheestimatedparameters.Oneconsolationisthatinthepresenceofalaggedvalueofthedependentvariable,asγapproachesone,theparametervariancestendtoorder1/T2ratherthanthe1/Tweareaccustomedto.Withthisfasterconvergence,thevariationduetoparameterestimationbecomeslessimportant.(SeeSection20.3.3forrelatedresults.)Thelevelofdifficultyinthiscasefallsfromimpossibletomerelyextremelydifficult.Inprinciple,whatisrequiredisEst.ConditionalVar[yˆ]=σ2[1+(1)+(2)+···+(F−1)]T+F|T111111+gEst.Asy.Var[µˆ,βˆ,γˆ]g,where∂yˆT+Fg=.∂[µˆ,βˆ,γˆ][SeeHamilton(1994,AppendixtoChapter11)forformalderivation.]\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables579Onepossibilityistousethebootstrapmethod.Forthisapplication,bootstrappingwouldinvolvesamplingnewsetsofdisturbancesfromtheestimateddistributionofεt,andthenrepeatedlyrebuildingthewithinsampletimeseriesofobservationsonytbyusingyˆt=µˆt+γ1yt−1+···+γpyt−p+ebt(m),whereebt(m)istheestimated“bootstrapped”disturbanceinperiodtduringreplica-tionm.TheprocessisrepeatedMtimes,withnewparameterestimatesandanewforecastgeneratedineachreplication.Thevarianceoftheseforecastsproducestheestimatedforecastvariance.819.5METHODOLOGICALISSUESINTHEANALYSISOFDYNAMICMODELS19.5.1ANERRORCORRECTIONMODELConsidertheARDL(1,1)model,whichhasbecomeaworkhorseofthemodernlit-eratureontime-seriesanalysis.Bydefiningthefirstdifferencesyt=yt−yt−1andxt=xt−xt−1wecanrearrangeyt=µ+γ1yt−1+β0xt+β1xt−1+εttoobtainyt=µ+β0xt+(γ1−1)(yt−1−θxt−1)+εt,(19-25)whereθ=−(β0+β1)/(γ1−1).Thisformofthemodelisintheerrorcorrectionform.Inthisform,wehaveanequilibriumrelationship,yt=µ+β0xt+εt,andtheequilibriumerror,(γ1−1)(yt−1−θxt−1),whichaccountforthedeviationofthepairofvariablesfromthatequilibrium.Themodelstatesthatthechangeinytfromthepreviousperiodconsistsofthechangeassociatedwithmovementwithxtalongthelong-runequilibriumpathplusapart(γ1−1)ofthedeviation(yt−1−θxt−1)fromtheequilibrium.Withamodelinlogs,thisrelationshipwouldbeinproportionalterms.Itisusefulatthisjuncturetojumpaheadabit—wewillreturntothistopicinsomedetailinChapter20—andexplorewhytheerrorcorrectionformmightbesuchausefulformulationofthissimplemodel.ConsidertheloggedconsumptionandincomedataplottedinFigure19.3.Itisobviousoninspectionofthefigurethatasimpleregressionofthelogofconsumptiononthelogofincomewouldsuggestahighlysignificantrelationship;infact,thesimplelinearregressionproducesaslopeof1.0567withatratioof440.5(!)andanR2of0.99896.ThedisturbingresultofalineofliteratureineconometricsthatbeginswithGrangerandNewbold(1974)andcontinuestothepresentisthatthisseeminglyobviousandpowerfulrelationshipmightbeentirelyspurious.Equallyobviousfromthefigureisthatbothctandytaretrendingvariables.If,infact,bothvariablesunconditionallywererandomwalkswithdriftofthesortthatwemetattheendofSection19.4.1—thatis,ct=tµc+vtandlikewiseforyt—thenwewouldalmostcertainlyobserveafiguresuchas19.3andcompellingregressionresultssuchasthose,eveniftherewerenorelationshipatall.Inaddition,thereisampleevidence8BernardandVeall(1987)giveanapplicationofthistechnique.See,also,McCullough(1996).\nGreene-50240bookJune26,200221:55580CHAPTER19✦ModelswithLaggedVariables9.6CTYT9.08.4Variable7.87.26.619491962197519882001QuarterFIGURE19.3ConsumptionandIncomeData.intherecentliteraturethatlow-frequency(infrequentlyobserved,aggregatedoverlongperiods)flowvariablessuchasconsumptionandoutputare,indeed,oftenwelldescribedasrandomwalks.Insuchdata,theARDL(1,1)modelmightappeartobeentirelyappropriateevenifitisnot.So,howisonetodistinguishbetweenthespuriousregressionandagenuinerelationshipasshownintheARDL(1,1)?Thefirstdifferenceofconsumptionproducesct=µc+vt−vt−1.Iftherandomwalkpropositionisindeedcorrect,thenthespuriousappearanceofregressionwillnotsurvivethefirstdifferencing,whereasifthereisarelationshipbetweenctandyt,thenitwillbepreservedintheerrorcorrectionmodel.WewillreturntothisissueinChapter20,whenweexaminetheissueofintegrationandcointegrationofeconomicvariables.Example19.5AnErrorCorrectionModelforConsumptionTheerrorcorrectionmodelisanonlinearregressionmodel,althoughinfactitisintrinsicallylinearandcanbededucedsimplyfromtheunrestrictedformdirectlyaboveit.Sincetheparameterθisactuallyofsomeinterest,itmightbemoreconvenienttousenonlinearleastsquaresandfitthesecondformdirectly.(Sincethemodelisintrinsicallylinear,thenonlinearleastsquaresestimateswillbeidenticaltothederivedlinearleastsquaresestimates.)ThelogsofconsumptionandincomedatainAppendixTableF5.1areplottedinFigure19.3.Notsurprisingly,thetwovariablesaredriftingupwardtogether.Theestimatederrorcorrectionmodel,withestimatedstandarderrorsinparentheses,isct−ct−1=−0.08533+(0.90458−1)[ct−1−1.06034yt−1]+0.58421(yt−yt−1).(0.02899)(0.03029)(0.01052)(0.05090)TheestimatedequilibriumerrorsareshowninFigure19.4.Notethattheyareallpositive,butthatineachperiod,theadjustmentisintheoppositedirection.Thus(accordingtothismodel),whenconsumptionisbelowitsequilibriumvalue,theadjustmentisupward,asmightbeexpected.\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables581.095.090EQERROR.085.08019501963197619892002QuarterFIGURE19.4Consumption–IncomeEquilibriumErrors.19.5.2AUTOCORRELATIONThedisturbanceintheerrorcorrectionmodelisassumedtobenonautocorrelated.AswesawinChapter12,autocorrelationinamodelcanbeinducedbymisspecification.Anorthodoxviewofthemodelingprocessmightstate,infact,thatthismisspecificationistheonlysourceofautocorrelation.Althoughadmittedlyabitoptimisticinitsimplication,thismisspecificationdoesraiseaninterestingmethodologicalquestion.ConsideronceagainthesimplestmodelofautocorrelationfromChapter12(withasmallchangeinnotationtomakeitconsistentwiththepresentdiscussion),yt=βxt+vt,vt=ρvt−1+εt,(19-26)whereεtisnonautocorrelated.Aswefoundearlier,thismodelcanbewrittenasyt−ρyt−1=β(xt−ρxt−1)+εt(19-27)oryt=ρyt−1+βxt−βρxt−1+εt.(19-28)ThismodelisanARDL(1,1)modelinwhichβ1=−γ1β0.Thus,wecanview(19-28)asarestrictedversionofyt=γ1yt−1+β0xt+β1xt−1+εt.(19-29)Thecrucialpointhereisthatthe(nonlinear)restrictionon(19-29)istestable,sothereisnocompellingreasontoproceedto(19-26)firstwithoutestablishingthattherestrictionisinfactconsistentwiththedata.TheupshotisthattheAR(1)disturbancemodel,asageneralproposition,isatestablerestrictiononasimpler,linearmodel,notnecessarilyastructureuntoitself.\nGreene-50240bookJune26,200221:55582CHAPTER19✦ModelswithLaggedVariablesNow,letustakethisargumenttoitslogicalconclusion.TheAR(p)disturbancemodel,vt=ρ1vt−1+···+ρpvt−p+εt,orR(L)vt=εt,canbewritteninitsmovingaverageformasεtvt=.R(L)[Recall,intheAR(1)model,thatε=u+ρu+ρ2u+···.]Theregressionmodelttt−1t−2withthisAR(p)disturbanceis,therefore,εtyt=βxt+.R(L)ButconsiderinsteadtheARDL(p,p)modelC(L)yt=βB(L)xt+εt.ThesecoefficientsarethesamemodelifB(L)=C(L).TheimplicationisthatanymodelwithanAR(p)disturbancecanbeinterpretedasanonlinearlyrestrictedversionofanARDL(p,p)model.Theprecedingdiscussionisaratherorthodoxviewofautocorrelation.Itispred-icatedontheAR(p)model.Researchershavefoundthatamoreinvolvedmodelfortheprocessgeneratingεtissometimescalledfor.Ifthetime-seriesstructureofεtisnotautoregressive,muchoftheprecedinganalysiswillbecomeintractable.Assuch,thereremainsroomfordisagreementwiththestrongconclusions.Wewillturntomodelswhosedisturbancesaremixturesofautoregressiveandmoving-averageterms,whichwouldbebeyondthereachofthisapparatus,inChapter20.19.5.3SPECIFICATIONANALYSISTheusualexplanationofautocorrelationisserialcorrelationinomittedvariables.TheprecedingdiscussionandourresultsinChapter12suggestanothercandidate:misspec-ificationofwhatwouldotherwisebeanunrestrictedARDLmodel.Thus,uponfindingevidenceofautocorrelationonthebasisofaDurbin–WatsonstatisticoranLMstatistic,wemightfindthatrelaxingthenonlinearrestrictionsontheARDLmodelisaprefer-ablenextstepto“correcting”fortheautocorrelationbyimposingtherestrictionsandrefittingthemodelbyFGLS.SinceanARDL(p,r)modelwithARdisturbances,evenwithp=0,isimplicitlyanARDL(p+d,r+d)model,wheredisusuallyone,theap-proachsuggestedisjusttoaddadditionallagsofthedependentvariabletothemodel.Thus,onemightevenaskwhywewouldeverusethefamiliarFGLSprocedures.[See,e.g.,Mizon(1995).]ThepayoffisthattherestrictionsimposedbytheFGLSprocedureproduceamoreefficientestimatorthanothermethods.Iftherestrictionsareinfactappropriate,thennotimposingthemamountstonotusinginformation.Arelatedquestionnowarises,apartfromtheissueofautocorrelation.InthecontextoftheARDLmodel,howshouldonedothespecificationsearch?(ThisquestionisnotspecifictotheARDLoreventothetime-seriessetting.)Isitbettertostartwithasmallmodelandexpandituntilconventionalfitmeasuresindicatethatadditionalvariablesarenolongerimprovingthemodel,orisitbettertostartwithalargemodelandpareawayvariablesthatconventionalstatisticssuggestaresuperfluous?Thefirststrategy,\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables583goingfromasimplemodeltoageneralmodel,islikelytobeproblematic,becausethestatisticscomputedforthenarrowermodelarebiasedandinconsistentifthehypothesisisincorrect.Consider,forexample,anLMtestforautocorrelationinamodelfromwhichimportantvariableshavebeenomitted.Theresultsarebiasedinfavorofafindingofautocorrelation.Thealternativeapproachistoproceedfromageneralmodeltoasimpleone.Thus,onemightoverfitthemodelandthensubjectittowhateverbatteryoftestsareappropriatetoproducethecorrectspecificationattheendoftheprocedure.Inthisinstance,theestimatesandteststatisticscomputedfromtheoverfitmodel,althoughinefficient,arenotgenerallysystematicallybiased.(Wehaveencounteredthisissueatseveralpoints.)Thelatterapproachiscommoninmodernanalysis,butsomewordsofcautionareneeded.Theprocedureroutinelyleadstooverfittingthemodel.Atypicaltime-seriesanalysismightinvolvespecifyingamodelwithdeeplagsonallthevariablesandthenparingawaythemodelasconventionalstatisticsindicate.Thedangeristhattheresultingmodelmighthaveanautoregressivestructurewithpeculiarholesinitthatwouldbehardtojustifywithanytheory.Thus,amodelforquarterlydatathatincludeslagsof2,3,6,and9onthedependentvariablewouldlooksuspiciouslyliketheendresultofacomputer-drivenfishingtripand,moreover,mightnotsurviveevenmoderatechangesintheestimationsample.[AsHendry(1995)notes,amodelinwhichthelargestandmostsignificantlagcoefficientoccursatthelastlagissurelymisspecified.]19.5.4COMMONFACTORRESTRICTIONSTheprecedingdiscussionsuggeststhatevidenceofautocorrelationinatime-seriesregressionmodelmightsignalmorethanmerelyaneedtousegeneralizedleastsquarestomakeefficientuseofthedata.[SeeHendry(1993).]Ifwefindevidenceofautocor-relationbased,say,ontheDurbin–WatsonstatisticoronDurbin’shstatistic,thenitwouldmakesensetotestthehypothesisoftheAR(1)modelthatmightnormallybethenextstepagainstthealternativepossibilitythatthemodelismerelymisspecified.Thetestissuggestedby(19-27)and(19-28).Ingeneral,wecanformulateitasatestofH:y=xβ+ρy−ρ(xβ)+ε0ttt−1t−1tversusH:y=xβ+ρy+xγ+ε.1ttt−1t−1tThenullmodelisobtainedfromthealternativebythenonlinearrestrictionγ=−ρβ.Sincethemodelsarebothclassicalregressionmodels,thetestcanbecarriedoutbyreferringtheFstatistic,(ee−ee)/J0011F[J,T−K1]=,e1e1/(T−K)totheappropriatecriticalvaluefromtheFdistribution.Thetestisonlyasymptoticallyvalidbecauseofthenonlinearityoftherestrictedregressionandbecauseofthelaggeddependentvariablesinthemodels.Therearetwoadditionalcomplicationsinthisproce-dure.First,theunrestrictedmodelmaybeunidentifiedbecauseofredundantvariables.Forexample,itwillusuallyhavetwoconstantterms.Ifbothztandzt−1appearintherestrictedequation,thenzt−1willappeartwiceintheunrestrictedmodel,andsoon.\nGreene-50240bookJune26,200221:55584CHAPTER19✦ModelswithLaggedVariablesThesolutionissimple;justdroptheredundantvariables.Thesumofsquareswithouttheredundantvariableswillbeidenticaltothatwiththem.Second,atfirstblush,therestrictionsinthenonlinearmodelappearcomplicated.Therestrictedmodel,however,isactuallyquitestraightforward.Rewriteitinafamiliarform:H:y=ρy+(x−ρx)β+ε.0tt−1tt−1tGivenρ,theregressionislinear.Inthisform,thegridsearchoverthevaluesofρcanbeusedtoobtainthefullsetofestimates.(Cochrane–Orcuttandtheothertwo-stepestimatorsarelikelynottobethebestsolution.)Also,itisimportanttosearchthefull[0,1]rangetoallowforthepossibilityoflocalminimaofthesumofsquares.Dependingontheavailablesoftware,itmaybeequallysimplejusttofitthenonlinearregressionmodeldirectly.Higher-ordermodelscanbehandledanalogously.InanAR(1)model,this“com-monfactor”restriction(thereasonforthenamewillbeclearshortly)takestheform(1−γL)yt=(β0+β1L)x1+εt,β1=−γβ0.Consider,instead,anAR(2)model.The“restricted”andunrestrictedmodelswouldappearasH:(1−ρL−ρL2)y=(1−ρL−ρL2)xβ+ε,012t12ttH:y=γy+γy+xβ+xβ+xβ+ε,1t1t−12t−2t0t−11t−22tsothefullsetofrestrictionsisβ1=−γ1β0andβ2=−γ2β0.ThisexpandedmodelcanbehandledanalogouslytotheAR(1)model.Onceagain,anFtestofthenonlinearrestrictionscanbeused.Thisapproachneglectsanotherpossibility.TherestrictedmodelabovegoesthefulldistancefromtheunrestrictedmodeltotheAR(2)autocorrelationmodel.Thereisanintermediatepossibility.Thepolynomialsinthelagoperator,C(L)andB(L),canbefactoredintoproductsoflinear,primitiveterms.AquadraticequationinL,forexample,mayalwaysbewrittenasC(L)=(1−γL−γL2)=(1−λL)(1−λL),1212wheretheλ’saretherootsofthecharacteristicpolynomialC(z)=0.Here,B(L)maybefactoredlikewise,sayinto(1−τ1L)(1−τ2L).(These“roots”mayincludepairsofimaginaryvalues.)Withtheseresultsinhand,rewritethebasicmodelC(L)yt=B(L)xt+εtintheform(1−λL)(1−λL)y=(1−τL)(1−τL)xβ+ε.12t12ttNowsupposethatλ1=τ1=ρ.Dividingthroughbothsidesoftheequationby(1−ρL)producestherestrictedmodelεt(1−λ2L)yt=(1−τ2L)xtβ+.1−ρLTherestrictedmodelisalower-orderautoregression,whichhassomevirtue,butnow,byconstruction,itsdisturbanceisanAR(1)processinρ.(Thisconclusionwasexpected,ofcourse,sincewereacheditinreverseatthebeginningofthissection.)Therestrictedmodelisappropriateonlyifthetwopolynomialshaveacommonfactor,(1−λ2)=(1−τ2),hencethenamefortheprocedure.\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables585ItisusefultodevelopthisprocedureinmoredetailforanARDL(2,2)model.Writethedistributedlagpart,B(L),asβ(1−βL−βL2).Multiplyingoutthefactors,012weseethattheunrestrictedmodel,y=µ+γy+γy+β(1−βL−βL2)x+ε,t1t−12t−2012ttcanbewrittenasyt=µ+(λ1+λ2)yt−1−(λ1λ2)yt−2+β0xt−β0(τ1+τ2)xt−1+β0(τ1τ2)xt−2+εt.Despitewhatappearstobeextremenonlinearity,thisequationisintrinsicallylinear.Infact,itcannotbeestimatedinthisformbynonlinearleastsquares,sinceanypairofvaluesλ1,λ2thatonemightfindcanjustbereversedandthefunctionandsumofsquareswillnotchange.Thesameistrueforpairsofτ1,τ2.Ofcourse,thisinformationisirrelevanttothesolution,sincethemodelcanbefitbyordinarylinearleastsquaresintheARDLformjustaboveit,andforthetest,weonlyneedthesumofsquares.Butnowimposethecommonfactorrestriction(1−λ1)=(1−τ1),orλ1=τ1.Thenowverynonlinearregressionmodelyt=µ+(τ1+λ2)yt−1−(τ1λ2)yt−2+β0xt−β0(τ1+τ2)xt−1+β0(τ1τ2)xt−2+εthassixtermsontheright-handsidebutonlyfiveparametersandisoveridentified.Thismodelcanbefitasisbynonlinearleastsquares.TheFtestofonerestrictionsuggestedearliercannowbecarriedout.NotethatthistestofonecommonfactorrestrictionisatestofthehypothesisoftheARDL(1,1)modelwithanAR(1)disturbanceagainsttheunrestrictedARDL(2,2)model.Turnedaround,wenote,onceagain,afindingofautocorrelationintheARDL(1,1)modeldoesnotnecessarilysuggestthatoneshouldjustuseGLS.Theappropriatenextstepmightbetoexpandthemodel.Finally,testingbothcommonfactorrestrictionsinthismodelisequivalenttotestingthetworestrictionsγ1=ρ1andγ2=ρ2inthemodelyt=γ1yt−1+γ2yt−2+β(xt−ρ1xt−1−ρ2xt−2)+εt.TheunrestrictedmodelisthelinearARDL(2,2)weusedearlier.Therestrictedmodelisnonlinear,butitcanbeestimatedeasilybynonlinearleastsquares.TheanalysisofcommonfactorsinmodelsmorecomplicatedthanARDL(2,2)isextremelyinvolved.[SeeHendry(1993)andHendryandDoornik(1996).]Example19.6TestingCommonFactorRestrictionsTheconsumptionandincomedatausedinExample19.5(quarters1950.3to2000.4)areusedtofitanunrestrictedARDL(2,2)model,ct=µ+γ1ct−1+γ2ct−2+β0yt+β1yt−1+β2yt−2+εt.OrdinaryleastsquaresestimatesoftheparametersappearinTable19.4.Fortheonecommonfactormodel,theparametersareformulatedasct=µ+(τ1+λ2)ct−1−(τ1λ2)ct−2+β0yt−β0(τ1+τ2)yt−1+β0(τ1τ2)yt−2+εt.ThestructuralparametersarecomputedusingnonlinearleastsquaresandthentheARDLcoefficientsarecomputedfromthese.Atwocommonfactorsmodelisobtainedbyimposingtheadditionalrestrictionλ2=τ2.Theresultingmodelisthefamiliarone,ct=µ+ρ1ct−1+ρ2ct−2+β0(yt−ρ1yt−1−ρ2yt−2)+εt.\nGreene-50240bookJune26,200221:55586CHAPTER19✦ModelswithLaggedVariablesTABLE19.4EstimatedAutoregressiveDistributedLagModelsParameterRestrictionsµγγβββee1201220.040200.69590.030440.5710−0.3974−0.17390.0091238(0.006397)(0.06741)(0.06747)(0.04229)(0.04563)(0.04206)[Estimated:ρ1=0.6959,ρ2=0.3044]1−0.0064990.6456−0.27240.59720.6104−0.25960.0088736(0.02959)(0.06866)(0.06784)(0.04342)(0.07225)(0.06685)[Estimated:τ1=−0.2887,τ2=0.8992,λ2=0.9433]0−0.066280.64870.27660.6126−0.4004−0.13290.0088626(0.03014)(0.07066)(0.06935)(0.05408)(0.08759)(0.06218)Standarderrorsaregiveninparentheses.Asexpected,theydeclinegenerallyastherestrictionsareadded.Thesumofsquaresincreasesatthesametime.TheFstatisticforonerestrictionis(0.0088736−0.0088626)/1F==0.243.0.0088626/(202−6)The95percentcriticalvaluefromtheF[1,119]tableis3.921,sothehypothesisofthesinglecommonfactorcannotberejected.TheFstatisticfortworestrictionsis5.777againstacriticalvalueof3.072,sothehypothesisoftheAR(2)disturbancemodelisrejected.19.6VECTORAUTOREGRESSIONSTheprecedingdiscussionscanbeextendedtosetsofvariables.Theresultingautore-gressivemodelisyt=µ+1yt−1+···+pyt−p+εt,(19-30)whereεtisavectorofnonautocorrelateddisturbances(innovations)withzeromeansandcontemporaneouscovariancematrixE[εε]=.Thisequationsystemisavectorttautoregression,orVAR.Equation(19-30)mayalsobewrittenas(L)yt=µ+εtwhere(L)isamatrixofpolynomialsinthelagoperator.Theindividualequationsarepppymt=µm+(j)m1y1,t−j+(j)m2y2,t−j+···+(j)mMyM,t−j+εmt,j=1j=1j=1where(j)lmindicatesthe(l,m)elementofj.VARshavebeenusedprimarilyinmacroeconomics.Earlyintheirdevelopment,itwasarguedbysomeauthors[e.g.,Sims(1980),Litterman(1979,1986)]thatVARswouldforecastbetterthanthesortofstructuralequationmodelsdiscussedinChapter15.Onecouldarguethataslongasµincludesthecurrentobservationsonthe(truly)relevantexogenousvariables,theVARissimplyanoverfitreducedformofsomesimultaneousequationsmodel.[SeeHamilton(1994,pp.326–327).]Theoverfittingresultsfromthepossibleinclusionofmorelagsthanwouldbeappropriateintheoriginalmodel.(SeeExample19.8foradetaileddiscussionofonesuchmodel.)Ontheotherhand,oneofthevirtuesoftheVARisthatitobviatesadecisionastowhatcontemporaneousvariables\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables587areexogenous;ithasonlylagged(predetermined)variablesontheright-handside,andallvariablesareendogenous.ThemotivationbehindVARsinmacroeconomicsrunsdeeperthanthestatisticalissues.9Thelargestructuralequationsmodelsofthe1950sand1960swerebuiltonathe-oreticalfoundationthathasnotprovedsatisfactory.ThattheforecastingperformanceofVARssurpassedthatoflargestructuralmodels—someofthelatercounterpartstoKlein’sModelIrantohundredsofequations—signaledtoresearchersamorefun-damentalproblemwiththeunderlyingmethodology.TheKeynesianstylesystemsofequationsdescribeastructuralmodelofdecisions(consumption,investment)thatseemlooselytomimicindividualbehavior;seeKeynes’sformulationoftheconsumptionfunc-tioninExample1.1thatis,perhaps,thecanonicalexample.Intheend,however,thesedecisionrulesarefundamentallyadhoc,andthereislittlebasisonwhichtoassumethattheywouldaggregatetothemacroeconomiclevelanyway.Onamorepracticallevel,thehighinflationandhighunemploymentexperiencedinthe1970swereverybadlypredictedbytheKeynesianparadigm.Fromthepointofviewoftheunderlyingparadigm,themosttroublingcriticismofthestructuralmodelingapproachcomesintheformof“theLucascritique”(1976)inwhichtheauthorarguedthattheparametersofthe“decisionrules”embodiedinthesystemsofstructuralequationswouldnotremainstablewheneconomicpolicieschanged,eveniftherulesthemselveswereappropriate.Thus,theparadigmunderlyingthesystemsofequationsapproachtomacroeconomicmodelingisarguablyfundamentallyflawed.Morerecentresearchhasreformulatedthebasicequationsofmacroeconomicmodelsintermsofamicroeconomicoptimiza-tionfoundationandhas,atthesametime,beenmuchlessambitiousinspecifyingtheinterrelationshipsamongeconomicvariables.Theprecedingargumentshavedrawnresearcherstolessstructuredequationsystemsforforecasting.Thus,itisnotjusttheformoftheequationsthathaschanged.Thevariablesintheequationshavechangedaswell;theVARisnotjustthereducedformofsomestructuralmodel.Forpurposesofanalyzingandforecastingmacroeconomicactivityandtracingtheeffectsofpolicychangesandexternalstimuliontheeconomy,researchershavefoundthatsimple,small-scaleVARswithoutapossiblyflawedtheo-reticalfoundationhaveprovedasgoodasorbetterthanlarge-scalestructuralequationsystems.Inadditiontoforecasting,VARshavebeenusedfortwoprimaryfunctions,testingGrangercausalityandstudyingtheeffectsofpolicythroughimpulseresponsecharacteristics.19.6.1MODELFORMSTosimplifythingsforthepresent,wenotethatthepthorderVARcanbewrittenasafirst-orderVARasfollows:ytµ12···pyt−1εtI0···0yyt−1=0+t−2+0.···············0···yt−p+100···I0yt−p09Anextremelyreadable,nontechnicaldiscussionoftheparadigmshiftinmacroeconomicforecastingisgiveninDiebold(1998b).SeealsoStockandWatson(2001).\nGreene-50240bookJune26,200221:55588CHAPTER19✦ModelswithLaggedVariablesThismeansthatwedonotloseanygeneralityincastingthetreatmentintermsofafirstordermodelyt=µ+yt−1+εt.InSection18.5,weexaminedDahlbergandJohansson’smodelformunicipalfinancesinSweden,inwhichy=[S,R,G]whereSisspending,RisreceiptsandGtttttttisgrantsfromthecentralgovernment,andp=3.WewillcontinuethatapplicationinExample19.8below.Inprinciple,theVARmodelisaseeminglyunrelatedregressionsmodel—indeed,aparticularlysimpleonesinceeachequationhasthesamesetofregressors.Thisisthetraditionalformofthemodelasoriginallyproposed,forexample,bySims(1980).TheVARmayalsobeviewedasthereducedformofasimultaneousequationsmodel;thecorrespondingstructurewouldthenbeyt=α+yt−1+ωtwhereisanonsingularmatrixandVar[ω]=.InoneofCecchettiandRich’s(2001)formulations,forexample,y=[y,π]whereyisthelogofaggregaterealoutput,tttt1−θ12πtistheinflationratefromtimet−1totimet,=andp=8.(Wewill−θ211examinetheirmodelinSection19.6.8.)Inthisform,wehaveaconventionalsimul-taneousequationsmodel,whichweanalyzedindetailinChapter15.Aswesaw,inorderforsuchamodeltobeidentified—thatis,estimable—certainrestrictionsmustbeplacedonthestructuralcoefficients.Thereasonforthisisthatultimately,onlytheoriginalVARform,nowthereducedform,isestimatedfromthedata;thestructuralparametersmustbededucedfromthesecoefficients.Inthismodel,inordertodeducethesestructuralparameters,theymustbeextractedfromthereducedformparame-ters,=−1,µ=−1α,and=−1−1.WeanalyzedthisissueindetailinSection15.3.Theresultswouldbethesamehere.InCecchettiandRich’sapplication,certainrestrictionswereplacedonthelagcoefficientsinordertosecureidentification.19.6.2ESTIMATIONIntheformof(19-30)—thatis,withoutautocorrelationofthedisturbances—VARsareparticularlysimpletoestimate.Althoughtheequationsystemcanbeexceedinglylarge,itis,infact,aseeminglyunrelatedregressionsmodelwithidenticalregressors.Assuch,theequationsshouldbeestimatedseparatelybyordinaryleastsquares.(SeeSection14.4.2fordiscussionofSURsystemswithidenticalregressors.)Thedisturbancecovariancematrixcanthenbeestimatedwithaveragesumsofsquaresorcross-productsoftheleastsquaresresiduals.Ifthedisturbancesarenormallydistributed,thentheseleastsquaresestimatorsarealsomaximumlikelihood.Ifnot,thenOLSremainsanefficientGMMestimator.TheextensiontoinstrumentalvariablesandGMMisabitmorecomplicated,asthemodelnowcontainsmultipleequations(seeSection14.4),butsincetheequationsarealllinear,thenecessaryextensionsareatleastrelativelystraightforward.GMMestimationoftheVARsystemisaspecialcaseofthemodeldiscussedinSection14.4.(WewillexamineanapplicationbelowinExample20.8.)TheproliferationofparametersinVARshasbeencitedasamajordisadvantageoftheiruse.Consider,forexample,aVARinvolvingfivevariablesandthreelags.Each\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables589has25unconstrainedelements,andtherearethreeofthem,foratotalof75freeparameters,plusanyothersinµ,plus5(6)/2=15freeparametersin.Ontheotherhand,eachsingleequationhasonly25parameters,andatleastgivensufficientdegreesoffreedom—there’stherub—alinearregressionwith25parametersissimplework.Moreover,applicationsrarelyinvolveevenasmanyasfourvariables,sothemodel-sizeissuemaywellbeexaggerated.19.6.3TESTINGPROCEDURESFormaltestingintheVARsettingusuallycenterseitherondeterminingtheappropriatelaglength(aspecificationsearch)oronwhethercertainblocksofzerosinthecoefficientmatricesarezero(asimplelinearrestrictiononthecollectionofslopeparameters).Bothtypesofhypothesesmaybetreatedassetsoflinearrestrictionsontheelementsinγ=vec[µ,1,2,...,p].Webeginbyassumingthatthedisturbanceshaveajointnormaldistribution.LetWbetheM×Mresidualcovariancematrixbasedonarestrictedmodel,andletW∗beitscounterpartwhenthemodelisunrestricted.Thenthelikelihoodratiostatistic,λ=T(ln|W|−ln|W∗|),canbeusedtotestthehypothesis.Thestatisticwouldhavealimitingchi-squareddis-tributionwithdegreesoffreedomequaltothenumberofrestrictions.Inprinciple,onemightbaseaspecificationsearchfortherightlaglengthonthiscalculation.Theproce-durewouldbetotestdownfrom,say,lagqtolagtop.Thegeneral-to-simpleprinciplediscussedinSection19.5.3wouldbetosetthemaximumlaglengthandtestdownfromituntildeletionofthelastsetoflagsleadstoasignificantlossoffit.Ateachstepatwhichthealternativelagmodelhasexcessterms,theestimatorsofthesuperfluouscoefficientmatriceswouldhaveprobabilitylimitsofzeroandthelikelihoodfunctionwould(again,asymptotically)resemblethatofthemodelwiththecorrectnumberoflags.Formally,supposetheappropriatelaglengthispbutthemodelisfitwithq≥p+1laggedterms.Then,underthenullhypothesis,∗d22λq=T[ln|W(µ,1,...,q−1)|−ln|W(µ,1,...,q)|]−→χ[M].Thesameapproachwouldbeusedtotestotherrestrictions.Thus,theGrangercausalitytestnotedbelowwouldfitthemodelwithandwithoutcertainblocksofzerosinthecoefficientmatrices,thenreferthevalueofλonceagaintothechi-squareddistribution.Forspecificationsearchesfortherightlag,thesuggestedproceduremaybelesseffectivethanonebasedontheinformationcriteriasuggestedforotherlinearmodels(seeSection8.4.)Lutkepohl(1993,pp.128–135)suggestsanalternativeapproachbasedontheminimizingfunctionsoftheinformationcriteriawehaveconsideredearlier;λ∗=ln(|W|)+(pM2+M)IC(T)/TwhereTisthesamplesize,pisthenumberoflags,MisthenumberofequationsandIC(T)=2fortheAkaikeinformationcriterionandlnTfortheSchwartz(Bayesian)informationcriterion.Weshouldnote,thisisnotateststatistic;itisadiagnostictoolthatweareusingtoconductaspecificationsearch.Also,asinallsuchcases,thetestingprocedureshouldbefromalargeronetoasmalleronetoavoidthemisspecificationproblemsinducedbyalaglengththatissmallerthantheappropriateone.\nGreene-50240bookJune26,200221:55590CHAPTER19✦ModelswithLaggedVariablesTheprecedinghasreliedheavilyonthenormalityassumption.Sincemostrecentapplicationsofthesetechniqueshaveeithertreatedtheleastsquaresestimatorsasrobust(distributionfree)estimators,orusedGMM(aswedidinChapter18),itisnecessarytoconsideradifferentapproachthatdoesnotdependonnormality.AnalternativeapproachwhichshouldberobusttovariationsintheunderlyingdistributionsistheWaldstatistic.[SeeLutkepohl(1993,pp.93–95).]Thefullsetofcoefficientsinthemodelmaybearrayedinasinglecoefficientvector,γ.LetcbethesampleestimatorofγandletVdenotetheestimatedasymptoticcovariancematrix.Then,thehypothesisinquestion(laglength,orotherlinearrestriction)canbecastintheformRγ−q=0.TheWaldstatisticfortestingthenullhypothesisisW=(Rc−q)[RVR]−1(Rc−q).Underthenullhypothesis,thisstatistichasalimitingchi-squareddistributionwithde-greesoffreedomequaltoJ,thenumberofrestrictions(rowsinR).Forthespecificationsearchfortheappropriatelaglength(ortheGrangercausalitytestdiscussedinthenextsection),thenullhypothesiswillbethatacertainsubvectorofγ,sayγ0,equalszero.Inthiscase,thestatisticwillbeW=cV−1c00000whereV00denotesthecorrespondingsubmatrixofV.Sincetimeseriesdatasetsareoftenonlymoderatelylong,useofthelimitingdistributionfortheteststatisticmaybeabitoptimistic.Also,theWaldstatisticdoesnotaccountforthefactthattheasymptoticcovariancematrixisestimatedusingafinitesample.Inouranalysisoftheclassicallinearregressionmodel,weaccommodatedtheseconsiderationsbyusingtheFdistributioninsteadofthelimitingchi-squared.(SeeSec-tion6.4.)TheadjustmentmadewastoreferW/JtotheF[J,T−K]distribution.Thisproducesamoreconservativetest—thecorrespondingcriticalvaluesofJFconvergeoftothoseofthechi-squaredfromabove.Aremainingcomplicationistodecidewhatdegreesoffreedomtouseforthedenominator.ItmightseemnaturaltouseMTminusthenumberofparameters,whichwouldbecorrectiftherestrictionsareimposedonallequationssimultaneously,sincetherearethatmany“observations.”Intestingforcausality,asinSection19.6.5below,Lutkepohl(1993,p.95)arguesthatMTisexcessive,sincetherestrictionsarenotimposedonallequations.Whenthecausalitytestinvolvestestingforzerorestrictionswithinasingleequation,theappropriatedegreesoffreedomwouldbeT−Mp−1forthatoneequation.19.6.4EXOGENEITYIntheclassicalregressionmodelwithnonstochasticregressors,thereisnoambiguityaboutwhichistheindependentorconditioningor“exogenous”variableinthemodelyt=β1+β2xt+εt.(19-31)Thisisthekindofcharacterizationthatmightapplyinanexperimentalsituationinwhichtheanalystischoosingthevaluesofxt.But,thecaseofnonstochasticregressorshaslittletodowiththesortofmodelingthatwillbeofinterestinthisandthenextchapter.Thereisnobasisforthenarrowassumptionofnonstochasticregressors,and,infact,inmostoftheanalysisthatwehavedonetothispoint,wehaveleftthisassumption\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables591farbehind.Withstochasticregressor(s),theregressionrelationshipsuchastheoneabovebecomesaconditionalmeaninabivariatedistribution.Inthismorerealisticsetting,whatconstitutesan“exogenous”variablebecomesambiguous.Assumingthattheregressionrelationshipislinear,(19-31)canbewritten(trivially)asyt=E[yt|xt]+y−E[yt|xt]wherethefamiliarmomentconditionE[xtεt]=0followsbyconstruction.But,thisformofthemodelisnomorethe“correct”equationthanwouldbext=δ1+δ2yt+ωtwhichis(weassume)xt=E[xt|yt]+xt−E[xt|yt]andnow,E[ytωt]=0.Sincebothequationsarecorrectlyspecifiedinthecontextofthebivariatedistribution,thereisnothingtodefineonevariableortheotheras“exoge-nous.”Thismightseempuzzling,butitis,infact,attheheartofthematterwhenoneconsidersmodelinginaworldinwhichvariablesarejointlydetermined.Thedefinitionofexogeneitydependsontheanalyst’sunderstandingoftheworldtheyaremodeling,and,inthefinalanalysis,onthepurposetowhichthemodelistobeput.ThemethodologicalplatformonwhichthisdiscussionrestsistheclassicpaperbyEngle,Hendry,andRichard(1983)wheretheypointoutthatexogeneityisnotanabsoluteconceptatall;itisdefinedinthecontextofthemodel.Thecentralidea,whichwillbeveryusefultoushere,isthatwedefineavariable(setofvariables)asexogenousinthecontextofourmodelifthejointdensitymaybewrittenf(yt,xt)=f(yt|β,xt)×f(θ,xt)wheretheparametersintheconditionaldistributiondonotappearinandarefunc-tionallyunrelatedtothoseinthemarginaldistributionofxt.Bythisarrangement,wecanthinkof“autonomousvariation”oftheparametersofinterest,β.Theparametersintheconditionalmodelforyt|xtcanbeanalyzedasiftheycouldvaryindependentlyofthoseinthemarginaldistributionofxt.Ifthisconditiondoesnothold,thenwecannotthinkofvariationofthoseparameterswithoutlinkingthatvariationtosomeeffectinthemarginaldistributionofxt.Inthiscase,itmakeslittlesensetothinkofxtassomehowbeingdetermined“outside”the(conditional)model.(WeconsideredthisissueinSection15.8inthecontextofasimultaneousequationsmodel.)Asecondformofexogeneitywewillconsiderisstrongexogeneity,whichissome-timescalledGrangernoncausality.GrangernoncausalitycanbesuperficiallydefinedbytheassumptionE[yt|yt−1,xt−1,xt−2,...]=E[yt|yt−1].Thatis,laggedvaluesofxtdonotprovideinformationabouttheconditionalmeanofytoncelaggedvaluesofyt,itself,areaccountedfor.Wewillconsiderthisissueattheendofthischapter.Forthepresent,wenotethatmostofthemodelswewillexaminewillexplicitlyfailthisassumption.Toputthisbackinthecontextofourmodel,wewillbeassumingthatinthemodelyt=β1+β2xt+β3xt−1+γyt−1+εt.\nGreene-50240bookJune26,200221:55592CHAPTER19✦ModelswithLaggedVariablesandtheextensionsthatwewillconsider,xtisweaklyexogenous—wecanmeaningfullyestimatetheparametersoftheregressionequationindependentlyofthemarginaldis-tributionofxt,butwewillallowforGrangercausalitybetweenxtandyt,thusgenerallynotassumingstrongexogeneity.19.6.5TESTINGFORGRANGERCAUSALITYCausalityinthesensedefinedbyGranger(1969)andSims(1972)isinferredwhenlaggedvaluesofavariable,sayxt,haveexplanatorypowerinaregressionofavariableytonlaggedvaluesofytandxt.(SeeSection15.2.2.)TheVARcanbeusedtotestthehypothesis.10TestsoftherestrictionscanbebasedonsimpleFtestsinthesingleequationsoftheVARmodel.ThattheunrestrictedequationshaveidenticalregressorsmeansthatthesetestscanbebasedontheresultsofsimpleOLSestimates.Thenotioncanbeextendedinasystemofequationstoattempttoascertainifagivenvariableisweaklyexogenoustothesystem.Iflaggedvaluesofavariablexthavenoexplanatorypowerforanyofthevariablesinasystem,thenwewouldviewxasweaklyexogenoustothesystem.Onceagain,thisspecificationcanbetestedwithalikelihoodratiotestasdescribedbelow—therestrictionwillbetoput“holes”inoneormorematrices—orwithaformofFtestconstructedbystackingtheequations.Example19.7GrangerCausality11AllbutoneofthemajorrecessionsintheU.S.economysinceWorldWarIIhavebeenprecededbylargeincreasesinthepriceofcrudeoil.DoesmovementofthepriceofoilcausemovementsinU.S.GDPintheGrangersense?Letyt=[GDP,crudeoilprice]t.Then,asimpleVARwouldbe!!!µ1α1α2ε1tyt=+yt−1+.µ2β1β2ε2tToassertacausalrelationshipbetweenoilpricesandGDP,wemustfindthatα2isnotzero;previousmovementsinoilpricesdohelpexplainmovementsinGDPeveninthepresenceofthelaggedvalueofGDP.Consistentwithourearlierdiscussion,thisfact,initself,isnotsufficienttoassertacausalrelationship.WewouldalsohavetodemonstratethattherewerenootherinterveningexplanationsthatwouldexplainmovementsinoilpricesandGDP.(WewillexamineamoreextensiveapplicationinExample19.9.)Toestablishthegeneralresult,itwillproveusefultowritetheVARinthemulti-variateregressionformatweusedinSection14.4.2.Partitionthetwodatavectorsytandxtinto[y1t,y2t]and[x1t,x2t].Consistentwithourearlierdiscussion,x1islaggedvaluesofy1andx2islaggedvaluesofy2.TheVARwiththispartitioningwouldbe!!!!!!y11112x1ε1ε1t1112=+,Var=.y22122x2ε2ε2t2122Wewouldstillobtaintheunrestrictedmaximumlikelihoodestimatesbyleastsquaresregressions.FortestingGrangercausality,thehypothesis12=0isofinterest.(SeeExample19.7.)ThismodelistheblockofzeroscaseexaminedinSection14.2.6.Thefullsetofresultsweneedarederivedthere.Fortestingthehypothesisofinterest,12=0,thesecondsetofequationsisirrelevant.FortestingforGrangercausalityin10SeeGeweke,Meese,andDent(1983),Sims(1980),andStockandWatson(2001).11ThisexampleisadaptedfromHamilton(1994,pp.307–308).\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables593theVARmodel,onlytherestrictedequationsarerelevant.Thehypothesiscanbetestedusingthelikelihoodratiostatistic.Forthepresentapplication,testingmeanscomputingS11=residualcovariancematrixwhencurrentvaluesofy1areregressedonvaluesofbothx1andx2,S11(0)=residualcovariancematrixwhencurrentvaluesofy1areregressedonlyonvaluesofx1.Thelikelihoodratiostatisticisthenλ=T(ln|S11(0)|−ln|S11|).Thenumberofdegreesoffreedomisthenumberofzerorestrictions.Asdiscussedearlier,thefactthatthistestisweddedtothenormaldistributionlimitsitsgenerality.TheWaldtestoritstransformationtoanapproximateFstatisticasdescribedinSection19.6.3isanalternativethatshouldbemoregenerallyapplicable.WhentheequationsystemisfitbyGMM,asinExample19.8,thesimplicityofthelikelihoodratiotestislost.TheWaldstatisticremainsusable,however.Anotherpossi-bilityistousetheGMMcounterparttothelikelihoodratiostatistic(seeSection18.4.2)basedontheGMMcriterionfunctions.ThisisjustthedifferenceintheGMMcriteria.Fittingbothrestrictedandunrestrictedmodelsinthisframeworkmaybeburdensome,buthavingsetuptheGMMestimatorforthe(larger)unrestrictedmodel,imposingthezerorestrictionsofthesmallermodelshouldrequireonlyaminormodification.Thereisacomplicationinthesecausalitytests.TheVARcanbemotivatedbytheWoldrepresentationtheorem(seeSection20.2.5,Theorem20.1),althoughwithassumednonautocorrelateddisturbances,themotivationisincomplete.Ontheotherhand,thereisnoformaltheorybehindtheformulation.Assuch,thecausalitytestsarepredicatedonamodelthatmay,infact,bemissingeitherinterveningvariablesoradditionallaggedeffectsthatshouldbepresentbutarenot.Forthefirstofthese,theproblemisthatafindingofcausaleffectsmightequallywellresultfromtheomissionofavariablethatiscorrelatedwithbothof(orall)theleft-hand-sidevariables.19.6.6IMPULSERESPONSEFUNCTIONSAnyVARcanbewrittenasafirst-ordermodelbyaugmentingit,ifnecessary,withadditionalidentityequations.Forexample,themodelyt=µ+1yt−1+2yt−2+vtcanbewritten!!!!ytµ12yt−1vt=++,yt−10I0yt−20whichisafirst-ordermodel.Wecanstudythedynamiccharacteristicsofthemodelineitherform,butthesecondismoreconvenient,aswillsoonbeapparent.Asweanalyzedearlier,inthemodelyt=µ+yt−1+vt,dynamicstabilityisachievedifthecharacteristicrootsofhavemoduluslessthanone.(Therootsmaybecomplex,becauseneednotbesymmetric.SeeSection19.4.3for\nGreene-50240bookJune26,200221:55594CHAPTER19✦ModelswithLaggedVariablesthecaseofasingleequationandSection15.9foranalysisofessentiallythismodelinasimultaneous-equationscontext.)Assumingthattheequationsystemisstable,theequilibriumisfoundbyobtainingthefinalformofthesystem.Wecandothisstepbyrepeatedsubstitution,ormoresimplybyusingthelagoperatortowriteyt=µ+(L)yt+vtor[I−(L)]yt=µ+vt.Withthestabilitycondition,wehavey=[I−(L)]−1(µ+v)tt∞=(I−)−1µ+ivt−ii=0∞(19-32)=y¯+ivt−ii=0=y¯+v+v+2v+···.tt−1t−2Thecoefficientsinthepowersofarethemultipliersinthesystem.Infact,byrenamingthingsslightly,thissetofresultsispreciselytheoneweexaminedinSec-tion15.9inourdiscussionofdynamicsimultaneous-equationsmodels.Wewillchangetheinterpretationslightlyhere,however.AswedidinSection15.9,weconsiderthecon-ceptualexperimentofdisturbingasysteminequilibrium.Supposethatvhasequaled0forlongenoughthatyhasreachedequilibrium,y¯.Nowweconsiderinjectingashocktothesystembychangingoneofthev’s,foroneperiod,andthenreturningittozerothereafter.Aswesawearlier,ymtwillmoveawayfrom,thenreturnto,itsequilibrium.ThepathwherebythevariablesreturntotheequilibriumiscalledtheimpulseresponseoftheVAR.12Intheautoregressiveformofthemodel,wecanidentifyeachinnovation,vmt,withaparticularvariableinyt,sayymt.Considerthentheeffectofaone-timeshocktothesystem,dvmt.Ascomparedwiththeequilibrium,wewillhave,inthecurrentperiod,ymt−y¯m=dvmt=φmm(0)dvt.Oneperiodlater,wewillhaveym,t+1−y¯m=()mmdvmt=φmm(1)dvt.Twoperiodslater,y−y¯=(2)dv=φ(2)dv,m,t+2mmmmtmmtandsoon.Thefunction,φmm(i)givestheimpulseresponsecharacteristicsofvariableymtoinnovationsinvm.Ausefulwaytocharacterizethesystemistoplottheim-pulseresponsefunctions.Theprecedingtracesthroughtheeffectonvariablemofa12SeeHamilton(1994,pp.318–323and336–350)fordiscussionandanumberofrelatedresults.\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables595one-timeinnovationinvm.Wecouldalsoexaminetheeffectofaone-timeinnovationofvlonvariablem.Theimpulseresponsefunctionwouldbeφ(i)=element(m,l)ini.mlPointestimationofφml(i)usingtheestimatedmodelparametersisstraightforward.Confidenceintervalspresentamoredifficultproblembecausetheestimatedfunctionsφˆml(i,β)ˆaresohighlynonlinearintheoriginalparameterestimates.Thedeltamethodhasthusprovedunsatisfactory.Killian(1998)presentsresultsthatsuggestthatboot-strappingmaybethemoreproductiveapproachtostatisticalinferenceregardingim-pulseresponsefunctions.19.6.7STRUCTURALVARsTheVARapproachtomodelingdynamicbehaviorofeconomicvariableshasprovidedsomeinterestinginsightsandappears[seeLitterman(1986)]tobringsomerealbenefitsforforecasting.Themethodhasreceivedsomestridentcriticismforitsatheoreticalapproach,however.The“unrestricted”natureofthelagstructurein(19-30)couldbesynonymouswith“unstructured.”Withnotheoreticalinputtothemodel,itisdifficulttoclaimthatitsoutputprovidesmuchofatheoreticallyjustifiedresult.Forexample,howarewetointerprettheimpulseresponsefunctionsderivedintheprevioussection?Whatliesbehindmuchofthisdiscussionistheideathatthereis,infact,astructureunderlyingthemodel,andtheVARthatwehavespecifiedisamerehodgepodgeofallitscomponents.Ofcourse,thatisexactlywhatreducedformsare.Assuch,torespondtothissortofcriticism,analystshavebeguntocastVARsformallyasreducedformsandtherebyattempttodeducethestructurethattheyhadinmindallalong.AVARmodelyt=µ+yt−1+vtcould,inprinciple,beviewedasthereducedformofthedynamicstructuralmodelyt=α+yt−1+εt,wherewehaveembeddedanyexogenousvariablesxtinthevectorofconstantsα.Thus,=−1,µ=−1α,v=−1ε,and=−1(−1).Perhapsitisthestructure,specifiedbyanunderlyingtheory,thatisofinterest.Forexample,wecandiscusstheimpulseresponsecharacteristicsofthissystem.Forparticularconfigurationsof,suchasatriangularmatrix,wecanmeaningfullyinterpretinnovations,ε.Asweexploredatgreatlengthinthepreviouschapter,however,asthismodelstands,thereisnotsuffi-cientinformationcontainedinthereducedformasjuststatedtodeducethestructuralparameters.Apossiblylargenumberofrestrictionsmustbeimposedon,,andtoenableustodeducestructuralformsfromreduced-formestimates,whicharealwaysobtainable.Therecentworkon“structuralVARs”centersonthetypesofrestrictionsandformsofthetheorythatcanbebroughttobeartoallowthisanalysistoproceed.See,forexample,thesurveyinHamilton(1994,Chapter11).Atthispoint,theliteratureonthissubjecthascomefullcirclebecausethecontemporarydevelopmentof“unstruc-turedVARs”becomesverymuchtheanalysisofquiteconventionaldynamicstructuralsimultaneousequationsmodels.Indeed,currentresearch[e.g.,Diebold(1998a)]bringstheliteraturebackintolinewiththestructuralmodelingtraditionbydemonstratinghowVARscanbederivedformallyasthereducedformsofdynamicstructuralmodels.Thatis,themostrecentapplicationshavebegunwithstructuresandderivedthereduced\nGreene-50240bookJune26,200221:55596CHAPTER19✦ModelswithLaggedVariablesformsasVARs,ratherthandepartingfromtheVARasareducedformandattemptingtodeduceastructurefromitbylayeringonrestrictions.19.6.8APPLICATION:POLICYANALYSISWITHAVARCecchettiandRich(2001)usedastructuralVARtoanalyzetheeffectofrecentdisin-flationarypoliciesoftheFedonaggregateoutputintheU.S.economy.TheFed’spolicyofthelasttwodecadeshasleanedmoretowardcontrollinginflationandlesstowardstimulationoftheeconomy.Theauthorsarguethatthelong-runbenefitsofthispolicyincludeeconomicstabilityandincreasedlong-termtrendoutputgrowth.But,thereisashort-termcostinlostoutput.Theirstudyseekstoestimatethe“sacrificeratio,”whichisameasureofthecumulativecostofthispolicy.Thespecificindicatortheystudymea-suresthecumulativeoutputlossafterτperiodsofapolicyshockattimet,wherethe(persistent)shockismeasuredasthechangeinthelevelofinflation.19.6.8aAVARModelfortheMacroeconomicVariablesThemodelproposedforestimatingtheratioisastructuralVAR,ppy=biy+b0π+biπ+εyt11t−i12t12t−iti=1i=1ppπ=b0y+biy+biπ+επt21t21t−i22t−iti=1i=1whereytisaggregaterealoutputinperiodtandπtistherateofinflationfromperiodt−1totandthemodeliscastintermsofratesofchangesofthesetwovariables.(Note,therefore,thatsumsofπtmeasureaccumulatedchangesintherateofinflation,notchangesintheCPI.)Theinnovations,ε=(εy,επ)isassumedtohavemean0,tttcontemporaneouscovariancematrixE[εε]=andtobestrictlynonautocorrelated.tt(WehaveretainedCecchettiandRich’snotationformostofthisdiscussion,saveforthenumberoflags,whichisdenotednintheirpaperandphere,andsomeotherminorchangeswhichwillbenotedinpassingwherenecessary.)13Theequationsystemmayalsobewritten!yytεtB(L)=πεπttwhereB(L)isa2×2matrixofpolynomialsinthelagoperator.Thecomponentsofthedisturbance(innovation)vectorεtareidentifiedasshockstoaggregatesupplyandaggregatedemandrespectively.19.6.8bTheSacrificeRatioInterestinthestudycentersontheimpactovertimeofstructuralshockstooutputandtherateofinflation.Inordertocalculatethese,theauthorsusethevectormoving13TheauthorsexaminetwootherVARmodels,athree-equationmodelofShapiroandWatson(1988),whichaddsanequationinrealinterestrates(it−πt)andafour-equationmodelbyGali(1992),whichmodelsyt,it,(it−πt),andtherealmoneystock,(mt−πt).AmongthefociofCecchettiandRich’spaperwasthesurprisinglylargevariationinestimatesofthesacrificeratioproducedbythethreemodels.Intheinterestofbrevity,wewillrestrictouranalysistoCecchetti’s(1994)two-equationmodel.\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables597average(VMA)formofthemodel,whichwouldbe!yy!yyt−1εtεtA11(L)A12(L)εt=[B(L)]=A(L)=πtεπεπA21(L)A22(L)επttt∞iy∞iπi=0a11εt−ii=0a12εt−i=.∞iy∞iπi=0a21εt−ii=0a22εt−i(Notethatthesuperscript“i”inthelastformofthemodelaboveisnotanexponent;itistheindexofthesequenceofcoefficients.)Theimpulseresponsefunctionsforthemodelcorrespondingto(19-30)arepreciselythecoefficientsinA(L).Inparticular,theeffectonthechangeininflationτperiodslaterofachangeinεπinperiodtisaτ.t22Thetotaleffectfromtimet+0totimet+τwouldbethesumofthese,τai.Thei=022counterpartsfortherateofoutputwouldbeτai.However,whatisneededisnoti=012theeffectonlyonperiodτ’soutput,butthecumulativeeffectonoutputfromthetimeoftheshockuptoperiodτ.Thatwouldbeobtainedbysummingtheseperiodspecificτiieffects,toobtaina.Combiningterms,thesacrificeratioisi=0j=012τ∂yt+jj=0π0ai+1ai+···+τaiτiai∂εti=012i=012i=012i=0j=012Sεπ(τ)==τi=τi.∂πt+τaai=022i=022∂επtThefunctionS(τ)isthenexaminedoverlongperiodstostudythelongtermeffectsofmonetarypolicy.19.6.8cIdentificationandEstimationofaStructuralVARModelEstimationofthismodelrequiressomemanipulation.Thestructuralmodelisacon-ventionallinearsimultaneousequationsmodeloftheformB0yt=Bxt+εtwhereyis(y,π)andxisthelaggedvaluesontheright-handside.AswesawttttinSection15.3.1,withoutfurtherrestrictions,amodelsuchasthisisnotidentified(estimable).AtotalofM2restrictions—Misthenumberofequations,heretwo—areneededtoidentifythemodel.Inthefamiliarcasesofsimultaneous-equationsmodelsthatweexaminedinChapter15,identificationisusuallysecuredthroughexclusionrestrictions,thatiszerorestrictions,eitherinB0orB.Thistypeofexclusionrestrictionwouldbeunnaturalinamodelsuchasthisone—therewouldbenobasisforpokingspecificholesinthecoefficientmatrices.Theauthorstakeadifferentapproach,whichrequiresustolookmorecloselyatthedifferentformsthetime-seriesmodelcantake.WritethestructuralformasB0yt=B1yt−1+B2yt−2+···+Bpyt−p+εt.where1−b012B0=.−b0121Asnoted,thisisintheformofaconventionalsimultaneousequationsmodel.AssumingthatBisnonsingular,whichforthistwo-equationsystemrequiresonlythat1−b0b001221\nGreene-50240bookJune26,200221:55598CHAPTER19✦ModelswithLaggedVariablesnotequalzero,wecanobtainthereducedformofthemodelas−1−1−1−1yt=B0B1yt−1+B0B2yt−2+···+B0Bpyt−p+B0εt(19-33)=D1yt−1+D2yt−2+···+Dpyt−p+µtwhereµtisthevectorofreducedforminnovations.Now,collectthetermsintheequiv-alentform[I−DL−DL2−···]y=µ.12ttThemovingaverageformthatweobtainedearlierisy=[I−DL−DL2−···]−1µ.t12tAssumingstabilityofthesystem,wecanalsowritethisasy=[I−DL−DL2−···]−1µt12t=[I−DL−DL2−···]−1B−1ε120t=[I+CL+CL2+···]µ12t=µt+C1µt−1+C2µt−2...−1=B0εt+C1µt−1+C2µt−2...So,theCjmatricescorrespondtoourAjmatricesintheoriginalformulation.But,−1thismanipulationhasaddedsomething.WecanseethatA0=B0.Lookingahead,thereducedformequationscanbeestimatedbyleastsquares.Whetherthestructuralpa-rameters,andthereafter,theVMAparameterscanaswelldependsentirelyonwhetherB0canbeestimated.From(19-33)wecanseethatifB0canbeestimated,thenB1...BpcanalsojustbypremultiplyingthereducedformcoefficientmatricesbythisestimatedB0.So,wemustnowconsiderthisissue.(ThisispreciselytheconclusionwedrewatthebeginningofSection15.3.)RecalltheinitialassumptionthatE[εε]=.Inthereducedform,weassumettE[µµ]=.Asweknow,reducedformsarealwaysestimable(indeed,byleastsquaresttiftheassumptionsofthemodelarecorrect).Thatmeansthatisestimablebytheleastsquaresresidualvariancesandcovariance.Fromtheearlierderivation,wehavethat=B−1(B−1)=AA.(Again,seethebeginningofSection15.3.)Theauthors0000havesecuredidentificationofthemodelthroughthisrelationship.Inparticular,theyassumefirstthat=I.Assumingthat=I,wenowhavethatAA=,whereis00anestimablematrixwiththreefreeparameters.SinceA0is2×2,onemorerestrictionisneededtosecureidentification.Atthispoint,theauthors,invokingBlanchardandQuah(1989),assumethat“demandshockshavenopermanenteffectonthelevelof∞ioutput.ThisisequivalenttoA12(1)=i=0a12=0.”Thismightseemlikeacumbersomerestrictiontoimpose.But,thematrixA(1)is[I−D−D−···−D]−1A=FAand12p00thecomponents,Djhavebeenestimatedasthereducedformcoefficientmatrices,soA12(1)=0assumesonlythattheupperrightelementofthismatrixiszero.WenowobtaintheequationsneededtosolveforA0.First,22!a0+a0a0a0+a0a0111211211222=σ11σ12A0A0=⇒22σσ(19-34)a0a0+a0a0a0+a01211112112222122\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables599whichprovidesthreeequations.Second,thetheoreticalrestrictionis!!∗fa0+fa011121222∗0FA0==.∗∗∗∗ThisprovidesthefourequationsneededtoidentifythefourelementsinA.140Collectingresults,theestimationstrategyisfirsttoestimateD1,...Dpandinthereducedform,byleastsquares.(Theysetp=8.)Thenusetherestrictionsand(19-34)−1−1toobtaintheelementsofA0=B0and,finally,Bj=A0Dj.Thelaststepisestimationofthematricesofimpulseresponses,whichcanbedoneasfollows:Wereturntothereducedformwhich,usingouraugmentationtrick,wewriteasyD1D2···DpyA0εttt−1yI0···0y0t−1=t−2+.(19-35)············0······yt−p+10···I0yt−p0Forconvenience,arrangethisresultasYt=(DL)Yt+wt.Now,solvethisforYttoobtainthefinalformY=[I−DL]−1w.ttWritethisinthespectralformandexpandaswedidearlier,toobtain∞Y=PiQw.(19-36)tt−ii=014Atthispoint,anintriguinglooseendarises.WehavecarriedthisdiscussionintheformoftheoriginalpapersbyBlanchardandQuah(1989)andCecchettiandRich(2001).Returningtotheoriginalstructure,however,weseethatsinceA−1,itactuallydoesnothavefourunrestrictedandunknownelements;0=B0ithastwo.Themodelisoveridentified.Wecouldhavepredictedthisattheoutset.Asinourconventionalsimultaneousequationsmodel,thenormalizationsinB0(onesonthediagonal)providetworestrictionsoftheM2=4required.Assumingthat=Iprovidesthreemore,andthetheoreticalrestrictionprovidesasixth.Therefore,thefourunknownelementsinanunrestrictedB0areoveridentified.Theassumptionthat=I,initself,maybeasubstantive,andstrongrestriction.IntheoriginaldatathatCecchettiandRichused,overtheperiodoftheirestimation,theunconditionalvariancesofytandπtare0.923and0.676.Thelatterisfarenoughbelowonethatonemightexpectthisassumptionactuallytobesubstantive.Itmightseemconvenientatthispointtoforegothetheoreticalrestrictiononlong-termimpacts,butitseemsmorenaturaltoomittherestrictionsonthescalingof.Withthetwonormalizationsalreadyinplace,assumingthattheinnovationsareuncorrelated(isdiagonal)and“demandshockshavenopermanenteffectonthelevelofoutput”togethersufficetoidentifythemodel.BlanchardandQuahappeartoreachthesameconclusion(page656),butthentheyalsoassumetheunitvariances[page657,equation(1).]Theyarguethattheassumptionofunitvariancesisjustaconvenientnormalization,butthisisnotthecase.Sincethemodelisalreadyidentifiedwithouttheassumption,thescalingrestrictionissubstantive.Onceagain,thisisclearfromalookatthestructure.TheassumptionthatB0hasonesonitsdiagonalhasalreadyscaledtheequation.Infact,thisislogicallyidenticaltoassumingthatthedisturbanceinaconventionalregressionmodelhasvarianceone,whichonenormallywouldnotdo.\nGreene-50240bookJune26,200221:55600CHAPTER19✦ModelswithLaggedVariablesWewillbeinterestedintheuppermostsubvectorofYt,soweexpand(19-36)toyieldytA0εt−i∞yt−1=PiQ0.······i=0yt−p+10ThematrixinthesummationisMp×Mp.TheimpactmatricesweseekaretheM×Mmatricesintheupperleftcornerofthespectralform,multipliedbyA0.19.6.8dInferenceAsnotedattheendofSection19.6.6,obtainingusablestandarderrorsforestimatesofimpulseresponsesisadifficult(asyetunresolved)problem.Killian(1998)hassuggestedthatbootstrappingisapreferableapproachtousingthedeltamethod.CecchettiandRichreachthesameconclusion,andlikewiseresorttoabootstrappingprocedure.Theirbootstrapprocedureiscarriedoutasfollows:Letδˆandˆdenotethefullsetofestimatedcoefficientsandestimatedreducedformcovariancematrixbasedondirectestimation.AssuggestedbyDoan(1996),theyconstructasequenceofNdrawsforthereducedformparameters,thenrecomputetheentiresetofimpulseresponses.Thenarrowestintervalwhichcontains90percentofthesedrawsistakentobeaconfidenceintervalforanestimatedimpulsefunction.19.6.8eEmpiricalResultsCecchettiandRichusedquarterlyobservationsonrealaggregateoutputandthecon-sumerpriceindex.Theirdatasetspanned1959.1to1997.4.ThisisasubsetofthedatadescribedintheAppendixTableF5.1.Beforebeginningtheiranalysis,theysub-jectedthedatatothestandardtestsforstationarity.Figures19.5through19.7showFIGURE19.5LogGDP.LogRealGDP,1959.1–1997.49.29.08.88.68.4LOGGDP8.28.07.87.6195819631968197319781983198819931998Quarter\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables601InflationRate,1959.1–1997.4.05.04.03.02INFL.01.00.01195819631968197319781983198819931998QuarterFIGURE19.6TheQuarterlyRateofInflation.FIGURE19.7RatesofChange,logGDPandtheRateofInflation.FirstDifferencesoflogGDPandInflation4.0DLOGYDPI2.71.5Variable.21.02.3195919641969197419791984198919941999Quarter\nGreene-50240bookJune26,200221:55602CHAPTER19✦ModelswithLaggedVariablesthelogofrealoutput,therateofinflation,andthechangesinthesetwovariables.Thefirsttwofiguresdosuggestthatneithervariableisstationary.OnthebasisoftheDickey–Fuller(1981)test(seeSection20.3),theyfound(asmightbeexpected)thattheytandπtseriesbothcontainunitroots.Theyconcludethatsinceoutputhasaunitroot,theidentificationrestrictionthatthelongruneffectofaggregatedemandshocksonoutputiswelldefinedandmeaningful.Theunitrootininflationallowsforpermanentshiftsinitslevel.Thelaglengthforthemodelissetatp=8.Long-runimpulseresponsefunctionaretruncatedat20years(80quarters).AnalysisisbasedontherateofchangedatashowninFigure19.7.Asafinalcheckonthemodel,theauthorsexaminedthedataforthepossibilityofastructuralshiftusingthetestsdescribedinSection7.5.NoneoftheAndrews/QuandtsupremumLMtest,Andrews/PlobergerexponentialLMtest,ortheAndrews/PlobergeraverageLMtestsuggestedthattheunderlyingstructurehadchanged(inspiteofwhatseemslikelytohavebeenamajorshiftinFedpolicyinthe1970s).Onthisbasis,theyconcludedthattheVARisstableoverthesampleperiod.Figure19.8(Figures3Aand3Btakenfromthearticle)showstheirtwoseparateestimatedimpulseresponsefunctions.Thedottedlinesinthefiguresshowthebootstrapgeneratedconfidencebounds.EstimatesofthesacrificeratioforCecchetti’smodelare1.3219forτ=4,1.3204forτ=8,1.5700forτ=12,1.5219forτ=16,and1.3763forτ=20.TheauthorsalsoexaminedtheforecastingperformanceoftheirmodelcomparedtoShapiroandWatson’sandGali’s.Thedeviceusedwastoproduceonestepahead,periodT+1|Tforecastsforthemodelestimatedusingperiods1...,T.Thefirstreducedformofthemodelisfitusing1959.1to1975.1andusedtoforecast1975.2.Then,itisreestimatedusing1959.1to1975.2andusedtoforecast1975.3,andsoon.Finally,therootmeansquarederroroftheseoutofsampleforecastsiscomparedforthreemodels.Ineachcase,thelevel,ratherthantherateofchangeoftheinflationrateisforecasted.Overall,theresultssuggestthatthesmallermodeldoesabetterjobofestimatingtheimpulseresponses(hassmallerconfidenceboundsandconformsmorenearlywiththeoreticalpredictions)butperformsworstofthethree(slightly)intermsofthemeansquarederroroftheout-of-sampleforecasts.Sincetheunrestrictedreducedformmodelisbeingusedforthelatter,thiscomesasnosurprise.Theendresultfollowsessentiallyfromtheresultthataddingvariablestoaregressionmodelimprovesitsfit.19.6.9VARsINMICROECONOMICSVARshaveappearedinthemicroeconometricsliteratureaswell.Chamberlain(1980)suggestedthatausefulapproachtotheanalysisofpaneldatawouldbetotreateachperiod’sobservationasaseparateequation.ForthecaseofT=2,wewouldhaveyi1=αi+βxi1+εi1,yi2=αi+βxi2+εi2,whereiindexesindividualsandαiareunobservedindividualeffects.Thisspecificationproducesamultivariateregression,towhichChamberlainaddedrestrictionsrelatedtotheindividualeffects.Holtz-Eakin,Newey,andRosen’s(1988)approachistospecify\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables603A:DynamicResponsetoaMonetaryPolicyShockRealGDP—Cecchetti0.60.40.20.00.20.4Log0.60.81.01.21.405101520B:DynamicResponsetoaMonetaryPolicyShockInflation—Cecchetti0.750.500.250.000.250.50Percent0.751.001.251.501.7505101520FIGURE19.8EstimatedImpulseResponseFunctions.\nGreene-50240bookJune26,200221:55604CHAPTER19✦ModelswithLaggedVariablestheequationasmmyit=α0t+αltyi,t−l+δltxi,t−l+tfi+µit.l=1l=1Intheirstudy,yitishoursworkedbyindividualiinperiodtandxitistheindividual’swageinthatperiod.Asecondequationforearningsisspecifiedwithlaggedvaluesofhoursandearningsontheright-handside.Theindividual,unobservedeffectsarefi.ThismodelissimilartotheVARin(19-30),butitdiffersinseveralwaysaswell.Thenumberofperiodsisquitesmall(14yearlyobservationsforeachindividual),buttherearenearly1000individuals.Thedynamicequationisspecifiedforaspecificperiod,however,sotherelevantsamplesizeineachcaseisn,notT.Also,thenumberoflagsinthemodelusedisrelativelysmall;theauthorsfixeditatthree.Theythushaveatwo-equationVARcontaining12unknownparameters,sixineachequation.Theauthorsusedthemodeltoanalyzecausality,measurementerror,andparameterstability—thatis,constancyofαltandδltacrosstime.Example19.8VARforMunicipalExpendituresInSection18.5,weexaminedamodelofmunicipalexpendituresproposedbyDahlbergandJohansson(2000):TheirequationofinterestismmmSSi,t=µt+βjSi,t−j+γjRi,t−j+δjGi,t−j+ui,tj=1j=1j=1fori=1,...,N=265andt=m+1,...,9.Si,t,Ri,tandGi,taremunicipalspending,receipts(taxesandfees)andcentralgovernmentgrants,respectively.AnalogousequationsarespecifiedforthecurrentvaluesofRi,tandGi,t.Thisproducesavectorautoregressionforeachmunicipality,Si,tµS,tβS,1γS,1δS,1Si,t−1Ri,t=µR,t+βR,1γR,1δR,1Ri,t−1+···Gi,tµG,tβG,1γG,1δG,1Gi,t−1SβS,mγS,mδS,mSi,t−mui,t+βγδR+uRR,mR,mR,mi,t−mi,t.βG,mγG,mδG,mGi,t−muGi,tThemodelwasestimatedbyGMM,sothediscussionattheendoftheprecedingsectionapplieshere.Wewillbeinterestedintestingwhetherchangesinmunicipalspending,Si,tareGrangercausedbychangesinrevenues,Ri,tandgrants,Gi,t.ThehypothesistobetestedisγS,j=δS,j=0forallj.Thishypothesiscanbetestedinthecontextofonlythefirstequation.ParameterestimatesanddiagnosticstatisticsaregiveninSection17.5.Wecancarryoutthetestintwoways.Intheunrestrictedequationwithallthreelaggedvaluesofallthreevariables,theminimizedGMMcriterionisq=22.8287.IfthelaggedvaluesofRandGareomittedfromtheSequation,thecriterionrisesto42.9182.15Thereare6restrictions.Thedifferenceis20.090sotheFstatisticis20.09/6=3.348.Wehaveover1,000degreesoffreedomforthedenominator,with265municipalitiesand5years,sowecanusethelimitingvalueforthecriticalvalue.Thisis2.10,sowemayrejectthehypothesisofnoncausalityandconcludethatchangesinrevenuesandgrantsdoGrangercausechangesinspending.15Onceagain,theseresultsdifferfromthosegivenbyDahlbergandJohansson.Asbefore,thedifferenceresultsfromouruseofthesameweightingmatrixforallGMMcomputationsincontrasttotheirrecomputationofthematrixforeachnewcoefficientvectorestimated.\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables605(Thisseemshardlysurprising.)ThealternativeapproachistouseaWaldstatistictotestthesixrestrictions.UsingthefullGMMresultsfortheSequationwith14coefficientsweobtainaWaldstatisticof15.3030.Thecriticalchi-squaredwouldbe6×2.1=12.6,soonceagain,thehypothesisisrejected.DahlbergandJohanssonapproachthecausalitytestsomewhatdifferentlybyusingasequentialtestingprocedure.(Seetheirpage413fordiscussion.)Theysuggestthattheinterveningvariablesbedroppedinturn.BydroppingfirstG,thenRandGandthenfirstRthenGandR,theyconcludethatgrantsdonotGrangercausechangesinspending(q=only.07)butintheabsenceofgrants,revenuesdo(q|grantsexcluded)=24.6.Thereverseorderproducesteststatisticsof12.2and12.4,respectively.Ourowncalculationsofthefourvaluesofqyields22.829forthefullmodel,23.1302withonlygrantsexcluded,23.0894withonlyRexcluded,and42.9182withbothexcluded,whichdisagreeswiththeirresultsbutisconsistentwithourearlierones.InstabilityofaVARModelThecoefficientsforthethree-variableVARmodelinExample19.8appearinTable18.4.Thecharacteristicrootsofthe9×9coefficientmatrixare−0.6025,0.2529,0.0840,(1.4586±0.6584i),(−0.6992±0.2019i)and(0.0611±0.6291i).Thefirstpairofcomplexrootshasmod-ulusgreaterthanone,sotheestimatedVARisunstable.Thedatadonotappeartobecon-sistentwiththisresult,thoughwithonlyfiveuseableyearsofdata,thatconclusionisabitfragile.Onemightsuspectthatthemodelisoverfit.Sincethedisturbancesareassumedtobeuncorrelatedacrossequations,thethreeequationshavebeenestimatedseparately.TheGMMcriterionforthesystemisthenthesumofthoseforthethreeequations.Form=3,2,and1,respectively,theseare(22.8287+30.5398+17.5810)=70.9495,30.4526+34.2590+20.5416)=85.2532,and(34.4986+53.2506+27.5927)=115.6119.Thediffer-encestatisticfortestingdownfromthreelagstotwois14.3037.Thecriticalchi-squaredforninedegreesoffreedomis19.62,soitwouldappearthatm=3maybetoolarge.Theresultsclearlyrejectthehypothesisthatm=1,however.ThecoefficientsforamodelwithtwolagsinsteadofoneappearinTable17.4.Ifweconstructfromtheseresultsinstead,weobtaina6×6matrixwhosecharacteristicrootsare1.5817,−0.2196,−0.3509±0.4362iand0.0968±0.2791i.Thesystemremainsunstable.19.7SUMMARYANDCONCLUSIONSThischapterhassurveyedaparticulartypeofregressionmodel,thedynamicregres-sion.Thesignaturefeatureofthedynamicmodeliseffectsthataredelayedorthatpersistthroughtime.Inastaticregressionsetting,effectsembodiedincoefficientsareassumedtotakeplaceallatonce.Inthedynamicmodel,theresponsetoaninnovationisdistributedthroughseveralperiods.Thefirstthreesectionsofthischapterexaminedseveraldifferentformsofsingleequationmodelsthatcontainedlaggedeffects.Thepro-gression,whichmirrorsthecurrentliteratureisfromtightlystructuredlag“models”(whichweresometimesformulatedtorespondtoashortageofdataratherthantocorrespondtoanunderlyingtheory)tounrestrictedmodelswithmultipleperiodlagstructures.Wealsoexaminedseveralhybridsofthesetwoforms,modelsthatallowlonglagsbutbuildsomeregularstructureintothelagweights.Thus,ourmodeloftheformationofexpectationsofinflationisreasonablyflexible,butdoesassumeaspecificbehavioralmechanism.Wethenexaminedseveralmethodologicalissues.Inthiscontextaselsewhere,thereisapreferenceinthemethodstowardformingbroadunrestrictedmodelsandusingfamiliarinferencetoolstoreducethemtothefinalappropriatespec-ification.Thesecondhalfofthechapterwasdevotedtoatypeofseeminglyunrelated\nGreene-50240bookJune26,200221:55606CHAPTER19✦ModelswithLaggedVariablesregressionsmodel.Thevectorautoregression,orVAR,hasbeenamajortoolinrecentresearch.Afterdevelopingtheeconometricframework,weexaminedtwoapplications,oneinmacroeconomicscenteredonmonetarypolicyandonefrommicroeconomics.KeyTermsandConcepts•Autocorrelation•Finitelags•Polynomialinlagoperator•Autoregression•General-to-simplemethod•Polynomiallagmodel•Autoregressivedistributed•Grangernoncausality•Randomwalkwithdriftlag•Impactmultiplier•Rationallag•Autoregressiveform•Impulseresponse•Simple-to-generalapproach•Autoregressivemodel•Infinitelagmodel•Specification•Characteristicequation•Infinitelags•Stability•Commonfactor•Innovation•Stationary•Distributedlag•Invertible•Strongexogeneity•Dynamicregressionmodel•Laggedvariables•Structuralmodel•Elasticity•Lagoperator•StructuralVAR•Equilibrium•Lagweight•Superconsistent•Equilibriumerror•Meanlag•Univariateautoregression•Equilibriummultiplier•Medianlag•Vectorautoregression•Equilibriumrelationship•Moving-averageform(VAR)•Errorcorrection•Oneperiodaheadforecast•Vectormovingaverage•Exogeneity•Partialadjustment(VMA)•Expectation•PhillipscurveExercises1.Obtainthemeanlagandthelong-andshort-runmultipliersforthefollowingdistributedlagmodels:a.yt=0.55(0.02xt+0.15xt−1+0.43xt−2+0.23xt−3+0.17xt−4)+et.b.ThemodelinExercise5.c.ThemodelinExercise6.(Doforeitherxorz.)2.Explainhowtoestimatetheparametersofthefollowingmodel:yt=α+βxt+γyt−1+δyt−2+et,et=ρet−1+ut.Isthereanyproblemwithordinaryleastsquares?Letytbeconsumptionandletxtbedisposableincome.Usingthemethodyouhavedescribed,fitthepreviousmodeltothedatainAppendixTableF5.1.Reportyourresults.3.Showhowtoestimateapolynomialdistributedlagmodelwithlagsofsixperiodsandathird-orderpolynomial.4.Expandtherationallagmodely=[(0.6+2L)/(1−0.6L+0.5L2)]x+e.Whattttarethecoefficientsonxt,xt−1,xt−2,xt−3,andxt−4?5.SupposethatthemodelofExercise4werespecifiedasβ+γLyt=α+xt+et.1−δ1L−δ2L2\nGreene-50240bookJune26,200221:55CHAPTER19✦ModelswithLaggedVariables607Describeamethodofestimatingtheparameters.Isordinaryleastsquaresconsis-tent?6.Describehowtoestimatetheparametersofthemodelxtztyt=α+β+δ+εt,1−γL1−φLwhereεtisaseriallyuncorrelated,homoscedastic,classicaldisturbance.7.Weareinterestedinthelongrunmultiplierinthemodel6yt=β0+βjxt−j+εt.j=0Assumethatxtisanautoregressiveseries,xt=rxt−1+vtwhere|r|<1.a.Whatisthelongrunmultiplierinthismodel?b.Howwouldyouestimatethelong-runmultiplierinthismodel?c.Supposeyouthattheprecedingisthetruemodelbutyoulinearlyregressytonlyonaconstantandthefirst5lagsofxt.Howdoesthisaffectyourestimateofthelongrunmultiplier?d.Sameasc.for4lagsinsteadof5.e.UsingthemacroeconomicdatainAppendixF5.1,letytbethelogofrealin-vestmentandxtbethelogofrealoutput.Carryoutthecomputationssuggestedandreportyourfindings.Specifically,howdoestheomissionofalaggedvalueaffectestimatesoftheshort-runandlong-runmultipliersintheunrestrictedlagmodel?\nGreene-50240bookJune27,200221:1120TIME-SERIESMODELSQ20.1INTRODUCTIONForforecastingpurposes,asimplemodelthatdescribesthebehaviorofavariable(orasetofvariables)intermsofpastvalues,withoutthebenefitofawell-developedtheory,maywellprovequitesatisfactory.Researchershaveobservedthatthelargesimultaneous-equationsmacroeconomicmodelsconstructedinthe1960sfrequentlyhavepoorerforecastingperformancethanfairlysimple,univariatetime-seriesmodelsbasedonjustafewparametersandcompactspecifications.Itisjustthisobservationthathasraisedtoprominencetheunivariatetime-seriesforecastingmodelspioneeredbyBoxandJenkins(1984).Inthischapter,weintroducesomeofthetoolsemployedintheanalysisoftime-seriesdata.1Section20.2describesstationarystochasticprocesses.WeencounteredthisbodyoftheoryinChapters12,16,and19,wherewediscoveredthatcertainassump-tionswererequiredtoascribefamiliarpropertiestoatime-seriesofdata.Wecontinuethatdiscussionbydefiningseveralcharacteristicsofastationarytime-series.Therecentliteratureinmacroeconometricshasseenanexplosionofstudiesofnonstationarytimeseries.Nonstationaritymandatesarevisionofthestandardinferencetoolswehaveusedthusfar.InSection20.3,onnonstationarityandunitroots,wediscusssomeofthesetools.Section20.4oncointegrationdiscussessomeextensionsofregressionmod-elsthataremadenecessarywhenstronglytrended,nonstationaryvariablesappearinthem.SomeoftheconceptstobediscussedherewereintroducedinSection12.2.Sec-tion12.2alsocontainsacursoryintroductiontothenatureoftime-seriesprocesses.Itwillbeusefultoreviewthatmaterialbeforeproceedingwiththerestofthischapter.Fi-nally,Sections15.9.1onestimationand15.9.2and19.4.3onstabilityofdynamicmodelswillbeespeciallyusefulforthelattersectionsofthischapter.1Eachtopicdiscussedhereisthesubjectofavastliteraturewitharticlesandbook-lengthtreatmentsatalllevels.Forexample,twosurveypapersonthesubjectofunitrootsineconomictime-seriesdata,DieboldandNerlove(1990)andCampbellandPerron(1991)citebetweenthemover200basicsourcesonthesubject.Theliteratureonunitrootsandcointegrationisalmostsurelythemostrapidlymovingtargetineconometrics.Stock’s(1994)surveyaddshundredsofreferencestothoseintheaforementionedsurveysandbringstheliteratureuptodateasofthen.UsefulbasicreferencesonthesubjectsofthischapterareBoxandJenkins(1984);Judgeetal.(1985);Mills(1990);GrangerandNewbold(1996);GrangerandWatson(1984);Hendry,Pagan,andSargan(1984);Geweke(1984);andespeciallyHarvey(1989,1990);Enders(1995);Hamilton(1994)andPatterson(2000).Therearealsomanysurveystyleandpedagogicalarticlesonthesesubjects.TheaforementionedpaperbyDieboldandNerloveisausefultourguidethroughsomeoftheliterature.WerecommendDickey,Bell,andMiller(1986)andDickey,Jansen,andThorton(1991)aswell.Thelatterisanespeciallyclearintroductionataverybasiclevelofthefundamentaltoolsforempiricalresearchers.608\nGreene-50240bookJune27,200221:11CHAPTER20✦Time-SeriesModels60920.2STATIONARYSTOCHASTICPROCESSESTheessentialbuildingblockforthemodelstobediscussedinthischapteristhewhitenoisetime-seriesprocess,{εt},t=−∞,+∞,whereeachelementinthesequencehasE[ε]=0,E[ε2]=σ2,andCov[ε,ε]=0ttetsforalls=t.Eachelementintheseriesisarandomdrawfromapopulationwithzeromeanandconstantvariance.Itisoccasionallyassumedthatthedrawsareindependentornormallydistributed,althoughformostofouranalysis,neitherassumptionwillbeessential.Aunivariatetime-seriesmodeldescribesthebehaviorofavariableintermsofitsownpastvalues.Consider,forexample,theautoregressivedisturbancemodelsintro-ducedinChapter12,ut=ρut−1+εt.(20-1)Autoregressivedisturbancesaregenerallytheresidualvariationinaregressionmodelbuiltupfromwhatmaybeanelaborateunderlyingtheory,yt=βxt+ut.Thetheoryusuallystopsshortofstatingwhatentersthedisturbance.Butthepresumptionthatsometime-seriesprocessgeneratesxtshouldextendequallytout.Therearetwowaystointerpretthissimpleseries.Asstatedabove,utequalsthepreviousvalueofutplusan“innovation,”εt.Alternatively,bymanipulatingtheseries,weshowedthatutcouldbeinterpretedasanaggregationoftheentirehistoryoftheεt’s.Occasionally,statisticalevidenceisconvincingthatamoreintricateprocessisatworkinthedisturbance.Perhapsasecond-orderautoregression,ut=ρ1ut−1+ρ2ut−2+εt,(20-2)betterexplainsthemovementofthedisturbancesintheregression.Themodelmaynotarisenaturallyfromanunderlyingbehavioraltheory.Butinthefaceofcertainkindsofstatisticalevidence,onemightconcludethatthemoreelaboratemodelwouldbepreferable.2ThissectionwilldescribeseveralalternativestotheAR(1)modelthatwehavereliedoninmostoftheprecedingapplications.20.2.1AUTOREGRESSIVEMOVING-AVERAGEPROCESSESThevariableytinthemodelyt=µ+γyt−1+εt(20-3)issaidtobeautoregressive(orself-regressive)becauseundercertainassumptions,E[yt|yt−1]=µ+γyt−1.Amoregeneralpth-orderautoregressionorAR(p)processwouldbewrittenyt=µ+γ1yt−1+γ2yt−2+···+γpyt−p+εt.(20-4)2Forexample,theestimatesofεtcomputedafteracorrectionforfirst-orderautocorrelationmayfailtestsofrandomnesssuchastheLM(Section12.7.1)test.\nGreene-50240bookJune27,200221:11610CHAPTER20✦Time-SeriesModelsTheanalogytotheclassicalregressionisclear.Nowconsiderthefirstordermovingaverage,orMA(1)specificationyt=µ+εt−θεt−1.(20-5)Bywritingyt=µ+(1−θL)εtorytµ3=+εt,1−θL1−θwefindthatµy=−θy−θ2y−···+ε.tt−1t−2t1−θOnceagain,theeffectistorepresentytasafunctionofitsownpastvalues.Anextremelygeneralmodelthatencompasses(20-4)and(20-5)istheautoregres-sivemovingaverage,orARMA(p,q),model:yt=µ+γ1yt−1+γ2yt−2+···+γpyt−p+εt−θ1εt−1−···−θqεt−q.(20-6)NotetheconventionthattheARMA(p,q)processhaspautoregressive(laggeddependent-variable)termsandqlaggedmoving-averageterms.Researchershavefoundthatmodelsofthissortwithrelativelysmallvaluesofpandqhaveprovedquiteeffectiveasforecastingmodels.Thedisturbancesεtarelabeledtheinnovationsinthemodel.Thetermisfittingbecausetheonlynewinformationthatenterstheprocessesinperiodtisthisinnovation.Consider,then,theAR(1)processyt=µ+γyt−1+εt.(20-7)Eitherbysuccessivesubstitutionorbyusingthelagoperator,weobtain(1−γL)yt=µ+εtorµ∞i4yt=+γεt−i.(20-8)1−γi=0Theobservedseriesisaparticulartypeofaggregationofthehistoryoftheinnovations.Themovingaverage,MA(q)model,yt=µ+εt−θ1εt−1−···−θqεt−q=µ+D(L)εt,(20-9)isyetanother,particularlysimpleformofaggregationinthatonlyinformationfromtheqmostrecentperiodsisretained.Thegeneralresultisthatmanytime-seriesprocessescanbeviewedeitherasregressionsonlaggedvalueswithadditivedisturbancesoras3ThelagoperatorisdiscussedinSection19.2.2.Sinceµisaconstant,(1−θL)−1µ=µ+θµ+θ2µ+···=µ/(1−θ).Thelagoperatormaybesetequaltoonewhenitoperatesonaconstant.4SeeSection19.3.2fordiscussionofmodelswithinfinitelagstructures.\nGreene-50240bookJune27,200221:11CHAPTER20✦Time-SeriesModels611aggregationsofahistoryofinnovations.Theydifferfromonetothenextintheformofthataggregation.Moreinvolvedprocessescanbesimilarlyrepresentedineitheranautoregressiveormoving-averageform.(Wewillturntothemathematicalrequirementsbelow.)Con-sider,forexample,theARMA(2,1)process,yt=µ+γ1yt−1+γ2yt−2+εt−θεt−1,whichwecanwriteas(1−θL)εt=yt−µ−γ1yt−1−γ2yt−2.If|θ|<1,thenwecandividebothsidesoftheequationby(1−θL)andobtain∞ε=θi(y−µ−γy−γy).tt−i1t−i−12t−i−2i=0Aftersometediousmanipulation,thisequationproducestheautoregressiveform,µ∞yt=+πiyt−i+εt,1−θi=1whereπ=γ−θandπ=−(θj−γθj−1−γθj−2),j=2,3,....(20-10)11j12Alternatively,bysimilar(yetmoretedious)manipulation,wewouldbeabletowrite∞µ1−θLµyt=+2εt=+δiεt−i.(20-11)1−γ1−γ21−γ1L−γ2L1−γ1−γ2i=0Ineachcase,theweights,πiintheautoregressiveformandδiinthemoving-averageformarecomplicatedfunctionsoftheoriginalparameters.Butnonetheless,eachisjustanalternativerepresentationofthesametime-seriesprocessthatproducesthecurrentvalueofyt.Thisresultisafundamentalpropertyofcertaintimeseries.Wewillreturntotheissueafterweformallydefinetheassumptionthatwehaveusedatseveralstepsabovethatallowsthesetransformations.20.2.2STATIONARITYANDINVERTIBILITYAtseveralpointsinthepreceding,wehavealludedtothenotionofstationarity,eitherdirectlyorindirectlybymakingcertainassumptionsabouttheparametersinthemodel.InSection12.3.2,wecharacterizedanAR(1)disturbanceprocessut=ρut−1+εt,asstationaryif|ρ|<1andεtiswhitenoise.ThenE[ut]=0forallt,σ2εVar[ut]=,1−ρ2(20-12)ρ|t−s|σ2εCov[ut,us]=.1−ρ2If|ρ|≥1,thenthevarianceandcovariancesareundefined.\nGreene-50240bookJune27,200221:11612CHAPTER20✦Time-SeriesModelsInthefollowing,weuseεttodenotethewhitenoiseinnovationsintheprocess.TheARMA(p,q)processwillbedenotedasin(20-6).DEFINITION20.1CovarianceStationarityAstochasticprocessytisweaklystationaryorcovariancestationaryifitsatisfiesthefollowingrequirements:51.E[yt]isindependentoft.2.Var[yt]isafinite,positiveconstant,independentoft.3.Cov[yt,ys]isafinitefunctionof|t−s|,butnotoftors.Thethirdrequirementisthatthecovariancebetweenobservationsintheseriesisafunctiononlyofhowfaraparttheyareintime,notthetimeatwhichtheyoccur.ThesepropertiesclearlyholdfortheAR(1)processimmediatelyabove.Whethertheyapplyfortheothermodelswehaveexaminedremainstobeseen.Wedefinetheautocovarianceatlagkasλk=Cov[yt,yt−k].Notethatλ−k=Cov[yt,yt+k]=λk.Stationarityimpliesthatautocovariancesareafunctionofk,butnotoft.Forexample,in(20-12),weseethattheautocovariancesoftheAR(1)processyt=µ+γyt−1+εtareγkσ2εCov[yt,yt−k]=,k=0,1....(20-13)1−γ2If|γ|<1,thenthisprocessisstationary.ForanyMA(q)series,yt=µ+εt−θ1εt−1−···−θqεt−q,E[yt]=µ+E[εt]−θ1E[εt−1]−···−θqE[εt−q]=µ,(20-14)Var[y]=1+θ2+···+θ2σ2,t1qεCov[y,y]=(−θ+θθ+θθ+···+θθ)σ2,tt−111223q−1qεandsoonuntilCov[y,y]=[−θ+θθ]σ2,tt−(q−1)q−11qεCov[y,y]=−θσ2,tt−qqε5Strongstationarityrequiresthatthejointdistributionofallsetsofobservations(yt,yt−1,...)beinvarianttowhentheobservationsaremade.Forpracticalpurposesineconometrics,thisstatementisatheoreticalfinepoint.Althoughweakstationarysufficesforourapplications,wewouldnotnormallyanalyzeweaklystationarytimeseriesthatwerenotstronglystationaryaswell.Indeed,weoftengoevenbeyondthisstepandassumejointnormality.\nGreene-50240bookJune27,200221:11CHAPTER20✦Time-SeriesModels613and,forlagsgreaterthanq,theautocovariancesarezero.Itfollows,therefore,thatfinitemoving-averageprocessesarestationaryregardlessofthevaluesoftheparameters.TheMA(1)processyt=εt−θεt−1isanimportantspecialcasethathasVar[yt]=(1+θ2)σ2,λ=−θσ2,andλ=0for|k|>1.e1ekFortheAR(1)process,thestationarityrequirementisthat|γ|<1,whichinturn,impliesthatthevarianceofthemovingaveragerepresentationin(20-8)isfinite.Con-sidertheAR(2)processyt=µ+γ1yt−1+γ2yt−2+εt.WritethisequationasC(L)yt=µ+εt,whereC(L)=1−γL−γL2.12Then,ifitispossible,weinvertthisresulttoproducey=[C(L)]−1(µ+ε).ttWhethertheinversionofthepolynomialinthelagoperatorleadstoaconvergentseriesdependsonthevaluesofγ1andγ2.Ifso,thenthemoving-averagerepresentationwillbe∞yt=δi(µ+εt−i)i=0sothat∞Var[y]=δ2σ2.tiεi=0Whetherthisresultisfiniteornotdependsonwhethertheseriesofδisisexplodingorconverging.FortheAR(2)case,theseriesconvergesif|γ2|<1,γ1+γ2<1,andγ−γ<1.621Forthemoregeneralcase,theautoregressiveprocessisstationaryiftherootsofthecharacteristicequation,C(z)=1−γz−γz2−···−γzp=0,12phavemodulusgreaterthanone,or“lieoutsidetheunitcircle.”7Itfollowsthatifastochasticprocessisstationary,ithasaninfinitemoving-averagerepresentation(and,ifnot,itdoesnot).TheAR(1)processisthesimplestcase.ThecharacteristicequationisC(z)=1−γz=0,6Thisrequirementrestricts(γ1,γ2)towithinatrianglewithpointsat(2,−1),(−2,−1),and(0,1).√7Therootsmaybecomplex.(SeeSections15.9.2and19.4.3.)Theyareoftheforma±bi,wherei=−1.Theunitcirclereferstothetwo-dimensionalsetofvaluesofaandbdefinedbya2+b2=1,whichdefinesacirclecenteredattheoriginwithradius1.\nGreene-50240bookJune27,200221:11614CHAPTER20✦Time-SeriesModelsanditssinglerootis1/γ.Thisrootliesoutsidetheunitcircleif|γ|<1,whichwesawearlier.Finally,considertheinversionofthemoving-averageprocessin(20-9)and(20-10).WhetherthisinversionispossibledependsonthecoefficientsinD(L)inthesamefash-ionthatstationarityhingesonthecoefficientsinC(L).Thiscounterparttostationarityofanautoregressiveprocessiscalledinvertibility.Forittobepossibletoinvertamoving-averageprocesstoproduceanautoregressiverepresentation,therootsofD(L)=0mustbeoutsidetheunitcircle.Notice,forexample,thatin(20-5),theinversionofthemoving-averageprocessispossibleonlyif|θ|<1.SincethecharacteristicequationfortheMA(1)processis1−θL=0,therootis1/θ,whichmustbelargerthanone.Iftherootsofthecharacteristicequationofamoving-averageprocessalllieoutsidetheunitcircle,thentheseriesissaidtobeinvertible.Notethatinvertibilityhasnobearingonthestationarityofaprocess.Allmoving-averageprocesseswithfinitecoefficientsarestationary.WhetheranARMAprocessisstationaryornotdependsonlyontheARpartofthemodel.20.2.3AUTOCORRELATIONSOFASTATIONARYSTOCHASTICPROCESSThefunctionλk=Cov[yt,yt−k]iscalledtheautocovariancefunctionoftheprocessyt.Theautocorrelationfunction,orACF,isobtainedbydividingbythevarianceλ0toobtainλkρk=,−1≤ρk≤1.λ0Forastationaryprocess,theACFwillbeafunctionofkandtheparametersoftheprocess.TheACFisausefuldevicefordescribingatime-seriesprocessinmuchthesamewaythatthemomentsareusedtodescribethedistributionofarandomvariable.Oneofthecharacteristicsofastationarystochasticprocessisanautocorrelationfunctionthateitherabruptlydropstozeroatsomefinitelagoreventuallytapersofftozero.TheAR(1)processprovidesthesimplestexample,sinceρ=γk,kwhichisageometricseriesthateitherdeclinesmonotonicallyfromρ0=1ifγispositiveorwithadampedsawtoothpatternifγisnegative.Noteaswellthatfortheprocessyt=γyt−1+εt,ρk=γρk−1,k≥1,whichbearsanoteworthyresemblancetotheprocessitself.Forhigher-orderautoregressiveseries,theautocorrelationsmaydeclinemonoton-icallyormayprogressinthefashionofadampedsinewave.8Consider,forexample,thesecond-orderautoregression,whereweassumewithoutlossofgeneralitythatµ=08Thebehaviorisafunctionoftherootsofthecharacteristicequation.ThisaspectisdiscussedinSection15.9andespecially15.9.3.\nGreene-50240bookJune27,200221:11CHAPTER20✦Time-SeriesModels615(sinceweareexaminingsecondmomentsindeviationsfromthemean):yt=γ1yt−1+γ2yt−2+εt.Iftheprocessisstationary,thenVar[yt]=Var[yt−s]foralls.Also,Var[yt]=Cov[yt,yt],andCov[εt,yt−s]=0ifs>0.Theserelationshipsimplythatλ=γλ+γλ+σ2.01122εNow,usingadditionallags,wefindthatλ1=γ1λ0+γ2λ1and(20-15)λ2=γ1λ1+γ2λ0.Thesethreeequationsprovidethesolution:2[(1−γ2)/(1+γ2)]λ0=σε22.1−γ1−γ2Thevarianceisunchanging,sowecandividethroughoutbyλ0toobtaintherelationshipsfortheautocorrelations,ρ1=γ1ρ0+γ2ρ1.Sinceρ0=1,ρ1=γ1/(1−γ2).Usingthesameprocedureforadditionallags,wefindthatρ2=γ1ρ1+γ2,soρ=γ2/(1−γ)+γ.Generally,then,forlagsoftwoormore,2122ρk=γ1ρk−1+γ2ρk−2.Onceagain,theautocorrelationsfollowthesamedifferenceequationastheseriesitself.Thebehaviorofthisfunctiondependsonγ1,γ2,andk,althoughnotinanobviousway.Theinherentbehavioroftheautocorrelationfunctioncanbededucedfromthecharac-teristicequation.9Forthesecond-orderprocessweareexamining,theautocorrelationsareoftheformρ=φ(1/z)k+φ(1/z)k,k1122wherethetworootsare101/z=1γ±γ2+4γ.2112Ifthetworootsarereal,thenweknowthattheirreciprocalswillbelessthanoneinabsolutevalue,sothatρkwillbethesumoftwotermsthataredecayingtozero.Ifthetworootsarecomplex,thenρkwillbethesumoftwotermsthatareoscillatingintheformofadampedsinewave.9ThesetofresultsthatwewouldusetoderivethisresultareexactlythoseweusedinSection19.4.3toanalyzethestabilityofadynamicequation,whichmakessense,ofcourse,sincetheequationlinkingtheautocorrelationsisasimpledifferenceequation.10WeusedthedeviceinSection19.4.4tofindthecharacteristicroots.Forasecond-orderequation,thequadraticiseasytomanipulate.\nGreene-50240bookJune27,200221:11616CHAPTER20✦Time-SeriesModelsApplicationsthatinvolveautoregressionsofordergreaterthantwoarerelativelyunusual.Nonetheless,higher-ordermodelscanbehandledinthesamefashion.FortheAR(p)processyt=γ1yt−1+γ2yt−2+···+γpyt−p+εt,theautocovarianceswillobeytheYule–Walkerequationsλ=γλ+γλ+···+γλ+σ2,01122ppελ1=γ1λ0+γ2λ1+···+γpλp−1,andsoon.Theautocorrelationswillonceagainfollowthesamedifferenceequationastheoriginalseries,ρk=γ1ρk−1+γ2ρk−2+···+γpρk−p.TheACFforamoving-averageprocessisverysimpletoobtain.Forthefirst-orderprocess,yt=εt−θεt−1,λ=(1+θ2)σ2,0ελ=−θσ2,1εthenλk=0fork>1.Higher-orderprocessesappearsimilarly.FortheMA(2)process,bymultiplyingoutthetermsandtakingexpectations,wefindthatλ=1+θ2+θ2σ2,012ελ=(−θ+θθ)σ2,1112ελ=−θσ2,21ελk=0,k>2.ThepatternforthegeneralMA(q)processyt=εt−θ1εt−1−θ2εt−2−···−θqεt−qisanalogous.Thesignatureofamoving-averageprocessisanautocorrelationfunctionthatabruptlydropstozeroatonelagpasttheorderoftheprocess.Aswewillexplorebelow,thissharpdistinctionprovidesastatisticaltoolthatwillhelpusdistinguishbetweenthesetwotypesofprocessesempirically.Themixedprocess,ARMA(p,q),ismorecomplicatedsinceitisamixtureofthetwoforms.FortheARMA(1,1)processyt=γyt−1+εt−θεt−1,theYule–Walkerequationsareλ=E[y(γy+ε−θε)]=γλ+σ2−σ2(θγ−θ2),0tt−1tt−11εελ=γλ−θσ2,10εandλk=γλk−1,k>1.ThegeneralcharacteristicofARMAprocessesisthatwhenthemoving-averagecom-ponentisoforderq,thenintheseriesofautocorrelationstherewillbeaninitialqtermsthatarecomplicatedfunctionsofboththeARandMAparameters,butafterqperiods,ρk=γ1ρk−1+γ2ρk−2+···+γpρk−p,k>q.\nGreene-50240bookJune27,200221:11CHAPTER20✦Time-SeriesModels61720.2.4PARTIALAUTOCORRELATIONSOFASTATIONARYSTOCHASTICPROCESSTheautocorrelationfunctionACF(k)givesthegrosscorrelationbetweenytandyt−k.ButaswesawinouranalysisoftheclassicalregressionmodelinSection3.4,agrosscorrelationsuchasthisonecanmaskacompletelydifferentunderlyingrelationship.Inthissetting,weobserve,forexample,thatacorrelationbetweenytandyt−2couldariseprimarilybecausebothvariablesarecorrelatedwithyt−1.ConsidertheAR(1)processy=γy+ε.Thesecondgrossautocorrelationisρ=γ2.Butinthesamespirit,tt−1t2wemightaskwhatisthecorrelationbetweenytandyt−2netoftheinterveningeffectofyt−1?Inthismodel,ifweremovetheeffectofyt−1fromyt,thenonlyεtremains,andthisdisturbanceisuncorrelatedwithyt−2.Wewouldconcludethatthepartialautocorrelationbetweenytandyt−2inthismodeliszero.DEFINITION20.2PartialAutocorrelationCoefficientThepartialcorrelationbetweenytandyt−kisthesimplecorrelationbetweenyt−kandytminusthatpartexplainedlinearlybytheinterveninglags.Thatis,ρ∗=Corr[y−E∗(y|y,...,y),y],kttt−1t−k+1t−kwhereE∗(y|y,...,y)istheminimummean-squarederrorpredictoroftt−1t−k+1ytbyyt−1,...,yt−k+1.ThefunctionE∗(.)mightbethelinearregressioniftheconditionalmeanhappenedtobelinear,butitmightnot.Theoptimallinearpredictoristhelinearregression,however,sowhatwehaveisρ∗=Corr[y−βy−βy−···−βy,y],kt1t−12t−2k−1t−k+1t−k−1whereβ=[β1,β2,...,βk−1]=Var[yt−1,yt−2,...,yt−k+1]×Cov[yt,(yt−1,yt−2,...,y)].Thisequationwillberecognizedasavectorofregressioncoefficients.Assuch,t−k+1whatwearecomputinghere(ofcourse)isthecorrelationbetweenavectorofresid-ualsandyt−k.Therearevariouswaystoformalizethiscomputation[see,e.g.,Enders(1995,pp.82–85)].Oneintuitivelyappealingapproachissuggestedbytheequivalentdefinition(whichisalsoaprescriptionforcomputingit),asfollows.DEFINITION20.3PartialAutocorrelationCoefficientThepartialcorrelationbetweenytandyt−kisthelastcoefficientinthelinearprojectionofyton[yt−1,yt−2,...,yt−k],β1−1λ0λ1···λk−2λk−1λ1β2λ1λ0···λk−3λk−2λ2..=......···.···.βk−1∗λk−1λk−2···λ1λ0λkρk\nGreene-50240bookJune27,200221:11618CHAPTER20✦Time-SeriesModelsAsbefore,therearesomedistinctivepatternsforparticulartime-seriesprocesses.Considerfirsttheautoregressiveprocesses,yt=γ1yt−1+γ2yt−2+···+γpyt−p+εt.Weareinterestedinthelastcoefficientintheprojectionofytonyt−1,thenon[yt−1,yt−2],andsoon.Thefirstoftheseisthesimpleregressioncoefficientofytonyt−1,so∗Cov[yt,yt−1]λ1ρ1===ρ1.Var[yt−1]λ0Thefirstpartialautocorrelationcoefficientforanyprocessequalsthefirstautocorrelationcoefficient.Withoutdoingthemessyalgebra,wealsoobservethatfortheAR(p)process,ρ∗isa1mixtureofalltheγcoefficients.Ofcourse,ifpequals1,thenρ∗=ρ=γ.Forthe11higher-orderprocesses,theautocorrelationsarelikewisemixturesoftheautoregressivecoefficientsuntilwereachρ∗.InviewoftheformoftheAR(p)model,thelastcoefficientpinthelinearprojectiononplaggedvaluesisγp.Also,wecanseethesignaturepatternoftheAR(p)process,anyadditionalpartialautocorrelationsmustbezero,becausetheywillbesimplyρ∗=Corr[ε,y]=0ifk>p.ktt−kCombiningresultsthusfar,wehavethecharacteristicpatternforanautoregressiveprocess.TheACF,ρk,willgraduallydecaytozero,eithermonotonicallyifthecharac-teristicrootsarerealorinasinusoidalpatterniftheyarecomplex.ThePACF,ρ∗,willkbeirregularouttolagp,whentheyabruptlydroptozeroandremainthere.Themoving-averageprocesshasthemirrorimageofthispattern.WehavealreadyexaminedtheACFfortheMA(q)process;ithasqirregularspikes,thenitfallstozeroandstaysthere.ForthePACF,writethemodelasy=(1−θL−θL2−···−θLq)ε.t12qtIftheseriesisinvertible,whichwewillassumethroughout,thenwehaveyt=εt,1−θ1L−···−θqLqoryt=π1yt−1+π2yt−2+···+εt∞=πiyt−i+εt.i=1TheautoregressiveformoftheMA(q)processhasaninfinitenumberofterms,whichmeansthatthePACFwillnotfallofftozerothewaythatthePACFoftheARprocessdoes.Rather,thePACFofanMAprocesswillresembletheACFofanARprocess.Forexample,fortheMA(1)processyt=εt−θεt−1,theARrepresentationisy=θy+θ2y+···+ε,tt−1t−2twhichisthefamiliarformofanAR(1)process.Thus,thePACFofanMA(1)processisidenticaltotheACFofanAR(1)process,ρ∗=θk.kTheARMA(p,q)isamixtureofthetwotypesofprocesses,soitsACFandPACFarelikewisemixturesofthetwoformsdiscussedabove.Generalitiesaredifficultto\nGreene-50240bookJune27,200221:11CHAPTER20✦Time-SeriesModels619draw,butnormally,theACFofanARMAprocesswillhaveafewdistinctivespikesintheearlylagscorrespondingtothenumberofMAterms,followedbythecharacteristicsmoothpatternoftheARpartofthemodel.High-orderMAprocessesarerelativelyuncommoningeneral,andhigh-orderARprocesses(greaterthantwo)seemprimarilytoariseintheformofthenonstationaryprocessesdescribedinthenextsection.Forastationaryprocess,theworkhorsesoftheappliedliteraturearethe(2,0)and(1,1)processes.FortheARMA(1,1)process,boththeACFandthePACFwilldisplayadistinctivespikeatlag1followedbyanexponentiallydecayingpatternthereafter.20.2.5MODELINGUNIVARIATETIMESERIESTheprecedingdiscussionislargelydescriptive.ThereisnounderlyingeconomictheorythatstateswhyacompactARMA(p,q)representationshouldadequatelydescribethemovementofagiveneconomictimeseries.Nonetheless,asamethodologyforbuildingforecastingmodels,thissetoftoolsanditsempiricalcounterparthaveprovedasgoodasandevensuperiortomuchmoreelaboratespecifications(perhapstotheconsternationofthebuildersoflargemacroeconomicmodels).11BoxandJenkins(1984)pioneeredaforecastingframeworkbasedontheprecedingthathasbeenusedinagreatmanyfieldsandthathas,certainlyintermsofnumbersofapplications,largelysupplantedtheuseoflargeintegratedeconometricmodels.BoxandJenkins’sapproachtomodelingastochasticprocesscanbemotivatedbythefollowing.THEOREM20.1Wold’sDecompositionTheoremEveryzeromeancovariancestationarystochasticprocesscanberepresentedintheform∞y=E∗[y|y,y,...,t]+πε,ttt−1t−2t−pit−ii=0whereεtiswhitenoise,π0=1,andtheweightsaresquaresummable—thatis,∞π2<∞ii=1—E∗[y|y,y,...,y]istheoptimallinearpredictorofybasedonitstt−1t−2t−ptlaggedvalues,andthepredictorE∗isuncorrelatedwithε.tt−iThus,thetheoremdecomposestheprocessgeneratingytintoE∗=E∗[y|y,y,...,y]=thelinearlydeterministiccomponentttt−1t−2t−p11Thisobservationcanbeoverstated.EventhemostcommittedadvocateoftheBox–JenkinsmethodswouldconcedethatanARMAmodelof,forexample,housingstartswilldolittletorevealthelinkbetweentheinterestratepoliciesoftheFederalReserveandtheirvariableofinterest.Thatis,thecovariationofeconomicvariablesremainsasinterestingasever.\nGreene-50240bookJune27,200221:11620CHAPTER20✦Time-SeriesModelsand∞πiεt−i=thelinearlyindeterministiccomponent.i=0Thetheoremstatesthatforanystationarystochasticprocess,foragivenchoiceofp,thereisaWoldrepresentationofthestationaryseriesp∞yt=γiyt−i+πiεt−i.i=1i=0NotethatforaspecificARMA(P,Q)process,ifp≥P,thenπi=0fori>Q.Forpracticalpurposes,theproblemwiththeWoldrepresentationisthatwecannotestimatetheinfinitenumberofparametersneededtoproducethefullright-handside,and,ofcourse,PandQareunknown.Thecompromise,then,istobaseanestimateoftherepresentationonamodelwithafinitenumberofmoving-averageterms.Wecanseektheonethatbestfitsthedatainhand.ItisimportanttonotethatneithertheARMArepresentationofaprocessnortheWoldrepresentationisunique.Ingeneralterms,supposethattheprocessgeneratingytis(L)yt=(L)εt.Weassumethat(L)isfinitebut(L)neednotbe.Let(L)besomeotherpolynomialinthelagoperatorwithrootsthatareoutsidetheunitcircle.Then(L)(L)(L)yt=(L)εt(L)(L)or(L)yt=(L)εt.Thenewrepresentationisfullyequivalenttotheoldone,butitmighthaveadifferentnumberofautoregressiveparameters,whichisexactlythepointoftheWolddecompo-sition.Theimplicationisthatpartofthemodel-buildingprocesswillbetodeterminethelagstructures.FurtherdiscussiononthemethodologyisgivenbyBoxandJenkins(1984).TheBox–Jenkinsapproachtomodelingstochasticprocessesconsistsofthefollow-ingsteps:1.Satisfactorilytransformthedatasoastoobtainastationaryseries.Thisstepwillusuallymeantakingfirstdifferences,logs,orbothtoobtainaserieswhoseautocorrelationfunctioneventuallydisplaysthecharacteristicexponentialdecayofastationaryseries.2.EstimatetheparametersoftheresultingARMAmodel,generallybynonlinearleastsquares.3.Generatethesetofresidualsfromtheestimatedmodelandverifythattheysatisfactorilyresembleawhitenoiseseries.Ifnot,respecifythemodelandreturntostep2.4.Themodelcannowbeusedforforecastingpurposes.\nGreene-50240bookJune27,200221:11CHAPTER20✦Time-SeriesModels621Spacelimitationspreventusfromgivingafullpresentationofthesetoftechniques.Sincethismethodologyhasspawnedamini-industryofitsown,however,thereisnoshortageofbooklengthanalysesandprescriptionstowhichthereadermayrefer.Fivetoconsiderarethecanonicalsource,BoxandJenkins(1984),GrangerandNewbold(1986),Mills(1993),Enders(1995)andPatterson(2000).Someoftheaspectsoftheestimationandanalysisstepsdohavebroaderrelevanceforourworkhere,sowewillcontinuetoexaminetheminsomedetail.20.2.6ESTIMATIONOFTHEPARAMETERSOFAUNIVARIATETIMESERIESThebroadproblemofregressionestimationwithtimeseriesdata,whichcarriesthroughtoallthediscussionsofthischapter,isthattheconsistencyandasymptoticnormalityresultsthatwederivedbasedonrandomsamplingwillnolongerapply.Forexample,forastationaryseries,wehaveassumedthatVar[yt]=λ0regardlessoft.Butwehaveyettoestablishthatanestimatedvariance,1Tc=(y−y¯)2,0tT−1t=1willconvergetoλ0,oranythingelseforthatmatter.Itisnecessarytoassumethattheprocessisergodic.(WefirstencounteredthisassumptioninSection12.4.1—seeDefinition12.3.)Ergodicityisacrucialelementofourtheoryofestimation.Whenatimeserieshasthisproperty(withstationarity),thenwecanconsiderestimationofparametersinameaningfulsense.Iftheprocessisstationaryandergodicthen,bytheErgodicTheorem(Theorems12.1and12.2)momentssuchasy¯andc0convergetotheirpopulationcounterpartsµandλ.12Theessentialcomponentofthecondition0isonethatwehavemetatmanypointsinthisdiscussion,thatautocovariancesmustdeclinesufficientlyrapidlyastheseparationintimeincreases.Itispossibletoconstructtheoreticalexamplesofprocessesthatarestationarybutnotergodic,butforpracticalpurposes,astationarityassumptionwillbesufficientforustoproceedwithestimation.Forexample,inourmodelsofstationaryprocesses,ifweassumethatε∼N[0,σ2],twhichiscommon,thenthestationaryprocessesareergodicaswell.Estimationoftheparametersofatime-seriesprocessmustbeginwithadetermi-nationofthetypeofprocessthatwehaveinhand.(BoxandJenkinslabelthistheidentificationstep.Butidentificationisatermofartineconometrics,sowewillsteeraroundthatadmittedlystandardname.)Forthispurpose,theempiricalestimatesoftheautocorrelationandpartialautocorrelationfunctionsareusefultools.ThesamplecounterparttotheACFisthecorrelogram,Tt=k+1(yt−y¯)(yt−k−y¯)rk=T.(yt−y¯)2t=1Aplotofrkagainstkprovidesadescriptionofaprocessandcanbeusedtohelpdiscernwhattypeofprocessisgeneratingthedata.ThesamplePACFisthecounterparttothe12Theformalconditionsforergodicityarequiteinvolved;seeDavidsonandMacKinnon(1993)orHamilton(1994,Chapter7).\nGreene-50240bookJune27,200221:11622CHAPTER20✦Time-SeriesModelsACF,butnetoftheinterveninglags;thatis,T∗∗∗t=k+1ytyt−krk=T∗,(y)2t=k+1t−kwherey∗andy∗areresidualsfromtheregressionsofyandyon[1,y,y,...,tt−ktt−kt−1t−2y].Wehaveseenthisatmanypointsbefore;r∗issimplythelastlinearleastsquarest−k+1kregressioncoefficientintheregressionofyton[1,yt−1,yt−2,...,yt−k+1,yt−k].PlotsoftheACFandPACFofaseriesareusuallypresentedtogether.Sincethesampleestimatesoftheautocorrelationsandpartialautocorrelationsarenotlikelytobeidenticallyzeroevenwhenthepopulationvaluesare,weusediagnosticteststodiscernwhetheratimeseriesappearstobenonautocorrelated.13Individualsampleautocorrelationswillbeapproximatelydistributedwithmeanzeroandvariance1/Tunderthehypothesisthattheseriesiswhitenoise.TheBox–Pierce(1970)statisticpQ=Tr2kk=1iscommonlyusedtotestwhetheraseriesiswhitenoise.Underthenullhypothesisthattheseriesiswhitenoise,Qhasalimitingchi-squareddistributionwithpdegreesoffreedom.Arefinementthatappearstohavebetterfinite-samplepropertiesistheLjung–Box(1979)statistic,p2rkQ=T(T+2).T−kk=1ThelimitingdistributionofQisthesameasthatofQ.Theprocessoffindingtheappropriatespecificationisessentiallytrialanderror.AninitialspecificationbasedonthesampleACFandPACFcanbefound.Theparametersofthemodelcanthenbeestimatedbyleastsquares.ForpureAR(p)processes,theestimationstepissimple.Theparameterscanbeestimatedbylinearleastsquares.Iftherearemoving-averageterms,thenlinearleastsquaresisinconsistent,buttheparametersofthemodelcanbefitbynonlinearleastsquares.Oncethemodelhasbeenestimated,asetofresidualsiscomputedtoassesstheadequacyofthespecification.InanARmodel,theresidualsarejustthedeviationsfromtheregressionline.Theadequacyofthespecificationcanbeexaminedbyapplyingtheforegoingtech-niquestotheestimatedresiduals.Iftheyappearsatisfactorilytomimicawhitenoiseprocess,thenanalysiscanproceedtotheforecastingstep.Ifnot,anewspecificationshouldbeconsidered.Example20.1ACFandPACFforaSeriesofBondYieldsAppendixTableF20.1lists5yearsofmonthlyaveragesoftheyieldonaMoody’sAaaratedcorporatebond.TheseriesisplottedinFigure20.1.Fromthefigure,itwouldappearthatstationaritymaynotbeareasonableassumption.Wewillreturntothisquestionbelow.TheACFandPACFfortheoriginalseriesareshowninTable20.1,withthediagnosticstatisticsdiscussedearlier.TheplotsappeartobeconsistentwithanAR(2)process,althoughtheACFatlongerlagsseemsabitmorepersistentthanmighthavebeenexpected.Onceagain,thiscondition13TheLMtestdiscussedinSection12.7.1isoneofthese.\nGreene-50240bookJune27,200221:11CHAPTER20✦Time-SeriesModels623mayindicatethattheseriesisnotstationary.Maintainingthatassumptionforthepresent,wecomputedtheresidualsfromtheAR(2)modelandsubjectedthemtothesametestsastheoriginalseries.ThecoefficientsoftheAR(2)modelare1.1566and−0.2083,whichalsosatisfytherestrictionsforstationaritygiveninSection20.2.2.Despitetheearliersuggestions,theresidualsdoappeartoresembleawhitenoiseseries(Table20.2).FIGURE20.1MonthlyDataonBondYields.1.000.950.9010)0.850.80Yield(0.750.700.651990.11990.101991.81992.61993.41994.21994.12MonthTABLE20.1ACFandPACFforBondYieldsTime-seriesidentificationforYIELDBox–Piercestatistic=323.0587Box–LjungStatistic=317.4389Degreesoffreedom=14Degreesoffreedom=14Significancelevel=0.0000Significancelevel=0.0000→|coefficient|>2/sqrt(N)or>95%significantAutocorrelationFunctionPartialAutocorrelationsLag−10+1Box–Pierce−10+110.97056.420.97020.908105.93−0.57330.840148.290.15740.775184.29−0.04350.708214.35−0.30960.636238.65−0.02470.567257.93−0.03780.501272.970.05990.439284.51−0.068100.395293.850.216110.370302.08−0.180120.354309.580.048130.339316.480.162140.331323.060.171\nGreene-50240bookJune27,200221:11624CHAPTER20✦Time-SeriesModelsTABLE20.2ACFandPACFforResidualsTime-seriesidentificationforUBox–Piercestatistic=13.7712Box–Ljungstatistic=16.1336Significancelevel=0.4669Significancelevel=0.3053→|coefficient|>2/sqrt(N)or>95%significantAutocorrelationFunctionPartialAutocorrelationsLag−10+1Box–Pierce−10+110.1541.380.1542−0.1472.64−0.1703−0.2075.13−0.17940.1616.640.18350.1177.430.06860.1148.180.0947−0.1108.89−0.06680.0418.990.1259−0.16810.63−0.258100.01410.640.03511−0.01610.660.01512−0.00910.66−0.08913−0.19512.87−0.16614−0.12513.770.13220.2.7THEFREQUENCYDOMAINFortheanalysisofmacroeconomicflowdatasuchasoutputandconsumption,andaggregateeconomicindexseriessuchasthepricelevelandtherateofunemploy-ment,thetoolsdescribedintheprevioussectionshaveprovedquitesatisfactory.Thelowfrequencyofobservation(yearly,quarterly,or,occasionally,monthly)andverysignificantaggregation(bothacrosstimeandofindividuals)makethesedatarela-tivelysmoothandstraightforwardtoanalyze.Muchcontemporaryeconomicanalysis,especiallyfinancialeconometrics,hasdealtwithmoredisaggregated,microleveldata,observedatfargreaterfrequency.Someimportantexamplesarestockmarketdataforwhichdailyreturnsdataareroutinelyavailable,andexchangeratemovements,whichhavebeentabulatedonanalmostcontinuousbasis.Inthesesettings,analystshavefoundthatthetoolsofspectralanalysis,andthefrequencydomain,haveprovidedmanyuse-fulresultsandhavebeenappliedtogreatadvantage.Thissectionintroducesasmallamountoftheterminologyofspectralanalysistoacquaintthereaderwithafewbasicfeaturesofthetechnique.Forthosewhodesirefurtherdetail,Fuller(1976),GrangerandNewbold(1996),Hamilton(1994),Chatfield(1996),Shumway(1988),andHatanaka(1996)(amongmanyotherswithdirectapplicationineconomics)areexcellentintro-ductions.MostofthefollowingisbasedonChapter6ofHamilton(1994).Inthisframework,weviewanobservedtimeseriesasaweightedsumofunderlyingseriesthathavedifferentcyclicalpatterns.Forexample,aggregateretailsalesandcon-structiondatadisplayseveraldifferentkindsofcyclicalvariation,includingaregularseasonalpatternandlongerfrequencyvariationassociatedwithvariationintheecon-omyasawholeoverthebusinesscycle.Thetotalvarianceofanobservedtimeseriesmaythusbeviewedasasumofthecontributionsoftheseunderlyingseries,whichvary\nGreene-50240bookJune27,200221:11CHAPTER20✦Time-SeriesModels625atdifferentfrequencies.Thestandardapplicationweconsiderishowspectralanalysisisusedtodecomposethevarianceofatimeseries.20.2.7.a.TheoreticalResultsLet{yt}t=−∞,∞defineazeromean,stationarytime-seriesprocess.TheautocovarianceatlagkwasdefinedinSection20.2.2asλk=λ−k=Cov[yt,yt−k].∞Weassumethattheseriesλkisabsolutelysummable;i=0|λk|isfinite.Theautocovari-ancegeneratingfunctionforthistime-seriesprocessis∞g(z)=λzk.Ykk=−∞√Weevaluatethisfunctionatthecomplexvaluez=exp(iω),wherei=−1andωisarealnumber,anddivideby2πtoobtainthespectrum,orspectraldensityfunction,ofthetime-seriesprocess,1∞h(ω)=λe−iωk.(20-16)Yk2πk=−∞Thespectraldensityfunctionisacharacteristicofthetime-seriesprocessverymuchlikethesequenceofautocovariances(orthesequenceofmomentsforaprobabilitydistribution).Foratime-seriesprocessthathasthesetofautocovariancesλk,thespectraldensitycanbecomputedatanyparticularvalueofω.SeveralresultscanbecombinedtosimplifyhY(ω):1.Symmetryoftheautocovariances,λk=λ−k;2.DeMoivre’stheorem,exp(±iωk)=cos(ωk)±isin(ωk);3.Polarvalues,cos(0)=1,cos(π)=0,sin(0)=0,sin(π)=1;4.Symmetriesofsinandcosfunctions,sin(−ω)=−sin(ω)andcos(−ω)=cos(ω).Oneoftheconvenientconsequencesofresult2isexp(iωk)+exp(−iωk)=2cos(ωk),whichisalwaysreal.Theseequationscanbecombinedtosimplifythespectrum.1∞hY(ω)=λ0+2λkcos(ωk),ω∈[0,π].(20-17)2πk=1Thisisastrictlyreal-valued,continuousfunctionofω.Sincethecosinefunctioniscyclicwithperiod2π,hY(ω)=hY(ω+M2π)foranyintegerM,whichimpliesthattheentirespectrumisknownifitsvaluesforωfrom0toπareknown.[Sincecos(−ω)=cos(ω),hY(ω)=hY(−ω),sothevaluesofthespectrumforωfrom0to−πarethesameasthosefrom0to+π.]Thereisalsoacorrespondencebetweenthespectrumandtheautocovariances,πλk=hY(ω)cos(kω)dω,−πwhichwecaninterpretasindicatingthatthesequenceofautocovariancesandthespectraldensityfunctionjustproducetwodifferentwaysoflookingatthesame\nGreene-50240bookJune27,200221:11626CHAPTER20✦Time-SeriesModelstime-seriesprocess(inthefirstcase,inthe“timedomain,”andinthesecondcase,inthe“frequencydomain,”hencethenameforthisanalysis).Thespectraldensityfunctionisafunctionoftheinfinitesequenceofautocovari-ances.ForARMAprocesses,however,theautocovariancesarefunctionsoftheusuallysmallnumbersofparameters,sohY(ω)willgenerallysimplifyconsiderably.FortheARMA(p,q)processdefinedin(20-6),(yt−µ)=γ1(yt−1−µ)+···+γp(yt−p−µ)+εt−θ1εt−1−···−θqεt−qor(L)(yt−µ)=(L)εt,theautocovariancegeneratingfunctionisσ2(z)(1/z)g(z)==σ2(z)(1/z),Y(z)(1/z)where(z)givesthesequenceofcoefficientsintheinfinitemoving-averagerepresen-tationoftheseries,(z)/(z).See,forexample,(201),wherethisresultisderivedfortheARMA(2,1)process.Insomecases,thisresultcanbeusedexplicitlytoderivethespectraldensityfunction.Thespectraldensityfunctioncanbeobtainedfromthisrelationshipthroughσ2h(ω)=(e−iω)(eiω).Y2πExample20.2SpectralDensityFunctionforanAR(1)ProcessForanAR(1)processwithautoregressiveparameterρ,yt=ρyt−1+εt,εt∼N[0,1],thelagpolynomialsare(z)=1and(z)=1−ρz.Theautocovariancegeneratingfunctionisσ2gY(z)=(1−ρz)(1−ρ/z)σ2=1+ρ2−ρ(z+1/z)∞iiσ2ρ1+z2=.1+ρ21+ρ2zi=0Thespectraldensityfunctionisσ21σ21hY(ω)==.2π[1−ρexp(−iω)][1−ρexp(iω)]2π[1+ρ2−2ρcos(ω)]Forthegeneralcasesuggestedattheoutset,(L)(yt−µ)=(L)εt,thereisatemplatewecanuse,which,ifnotsimple,isatleasttransparent.Letαibethereciprocalofarootofthecharacteristicpolynomialfortheautoregressivepartofthemodel,(αi)=0,i=1,...,p,andletδj,j=1,...,q,bethesameforthemoving-averagepartofthemodel.Thenqσ21+δ2−2δcos(ω)j=1jjhY(ω)=p2.2πi=11+αi−2αicos(ω)\nGreene-50240bookJune27,200221:11CHAPTER20✦Time-SeriesModels627Someoftherootsofeitherpolynomialmaybecomplexpairs,butinthiscase,theproductforadjacentpairs(a±bi)isreal,sothefunctionisalwaysrealvalued.[Notealsothat(a±bi)−1=(a∓bi)/(a2+b2).]Forpurposesofourinitialobjective,decomposingthevarianceofthetimeseries,ourfinalusefultheoreticalresultisπhY(ω)dω=λ0.−πThus,thetotalvariancecanbeviewedasthesumofthespectraldensitiesoverallpossiblefrequencies.(Moreprecisely,itistheareaunderthespectraldensity.)Onceagainexploitingthesymmetryofthecosinefunction,wecanrewritethisequationintheformπ2hY(ω)dω=λ0.0Consider,then,integrationoveronlysomeofthefrequencies;ω2jhY(ω)dω=τ(ωj),0<ωj≤π,0<τ(ωj)≤1.λ00Thus,τ(ωj)canbeinterpretedastheproportionofthetotalvarianceofthetimeseriesthatisassociatedwithfrequencieslessthanorequaltoωj.20.2.7.b.EmpiricalCounterpartsWehaveinhandasampleofobservations,yt,t=1,...,T.Thefirsttaskistoestablishacorrespondencebetweenthefrequencies0<ω≤πandsomethingofinterestinthesample.Thelowestfrequencywecouldobservewouldbeonceintheentiresampleperiod,sowemapω1to2π/T.ThehighestwouldthenbeωT=2π,andtheinterveningvalueswillbe2πj/T,j=2,...,T−1.Itmaybemoreconvenienttothinkintermsofperiodratherthanfrequency.ThenumberofperiodspercyclewillcorrespondtoT/j=2π/ωj.Thus,thelowestfrequency,ω1,correspondstothehighestperiod,T“dates”(months,quarters,years,etc.).Thereareanumberofwaystoestimatethepopulationspectraldensityfunction.Theobviouswayisthesamplecounterparttothepopulationspectrum.ThesampleofTobservationsprovidesthevarianceandT−1distinctsampleautocovariances1T1Tck=c−k=(yt−y¯)(yt−k−y¯),y¯=yt,k=0,1,...,T−1,TTt=k+1t=1sowecancomputethesampleperiodogram,whichis1T−1hˆY(ω)=c0+2ckcos(ωk).2πk=1Thesampleperiodogramisanaturalestimatorofthespectrum,butithasastatisti-calflaw.WiththesamplevarianceandtheT−1autocovariances,weareestimatingTparameterswithTobservations.Theperiodogramis,intheend,TtransformationsoftheseTestimates.Assuch,thereareno“degreesoffreedom”;theestimatordoesnotimproveasthesamplesizeincreases.Anumberofmethodshavebeensuggestedforimprovingthebehavioroftheestimator.Twocommonwaysaretruncationand\nGreene-50240bookJune27,200221:11628CHAPTER20✦Time-SeriesModelswindowing[seeChatfield(1996,pp.139–143)].Thetruncatedestimatoroftheperi-odogramisbasedonasubsetofthefirstL−3.44.017880922andDFγ=202(0.9584940384−1)=−8.38>−21.2.Neitherislessthanthecriticalvalue,soweconclude(ashaveothers)thatthereisaunitrootinthelogGDPprocess.TheresultsoftheothertestsareshowninTable20.6.Surprisingly,theseresultsdodiffersharplyfromthoseobtainedbyCecchettiandRich(2001)forπandm.Thesampleperiodappearstomatter;ifwerepeatthecomputationusingCecchettiandRich’sinterval,1959.4to1997.4,thenDFτequals−3.51.Thisisborderline,butlesscontradictory.Formweobtainavalueof−4.204forDFτwhenthesampleisrestrictedtotheshorterinterval.\nGreene-50240bookJune27,200221:11CHAPTER20✦Time-SeriesModels641161412108T-billRate642019501963197619892002QuarterFIGURE20.11TBillRate.6543M12101219501963197619892002QuarterFIGURE20.12ChangeintheMoneyStock.\nGreene-50240bookJune27,200221:11642CHAPTER20✦Time-SeriesModels151050RealInterestRate5101519501963197619892002QuarterFIGURE20.13ExPostRealTBillRate.10505RealM110152019501963197619892002QuarterFIGURE20.14ChangeintheRealMoneyStock.\nGreene-50240bookJune27,200221:11CHAPTER20✦Time-SeriesModels643TABLE20.6UnitRootTests.(Standarderrorsofestimatesinparentheses)µβγDFτDFγConclusionπ0.3320.659−6.40−68.88RejectH0(0.0696)(0.0532)R2=0.432,s=0.643y0.3200.000330.958−2.35−8.48DonotrejectH0(0.134)(0.00015)(0.0179)R2=0.999,s=0.001i0.2280.961−2.14−7.88DonotrejectH0(0.109)(0.0182)R2=0.933,s=0.743m0.4480.596−7.05−81.61RejectH0(0.0923)(0.0573)R2=0.351,s=0.929i−π0.6150.557−7.57−89.49RejectH0(0.185)(0.0585)R2=0.311,s=2.395m−π0.07000.490−8.25−103.02RejectH0(0.0833)(0.0618)R2=0.239,s=1.176TheDickey–Fullertestsdescribedaboveassumethatthedisturbancesinthemodelasstatedarewhitenoise.AnextensionwhichwillaccommodatesomeformsofserialcorrelationistheaugmentedDickey–Fullertest.TheaugmentedDickey–Fullertestisthesameoneasabove,carriedoutinthecontextofthemodelyy=µ+βt+γyt−1+γ1yt−1+···+γpyt−p+εt.Therandomwalkformisobtainedbyimposingµ=0andβ=0;therandomwalkwithdrifthasβ=0;andthetrendstationarymodelleavesbothparametersfree.Thetwoteststatisticsareγˆ−1DFτ=Est.Std.Error(γ)ˆexactlyasconstructedbeforeandT(γˆ−1)DFγ=.1−γˆ1−···−γˆpTheadvantageofthisformulationisthatitcanaccommodatehigher-orderautoregres-siveprocessesinεt.Analternativeformulationmayproveconvenient.Bysubtractingyt−1frombothsidesoftheequation,weobtainp−1y=µ+γ∗y+φy+ε,tt−1jt−jtj=1whereppφ=−γandγ∗=γ−1.jkik=j+1i=1\nGreene-50240bookJune27,200221:11644CHAPTER20✦Time-SeriesModelsTheunitroottestiscarriedoutasbeforebytestingthenullhypothesisγ∗=0againstγ∗<0.22Thettest,DFmaybeused.Ifthefailuretorejecttheunitrootistakenτasevidencethataunitrootispresent,i.e.,γ∗=0,thenthemodelspecializestotheAR(p−1)modelinthefirstdifferenceswhichisanARIMA(p−1,1,0)modelforyt.Foramodelwithatimetrend,p−1y=µ+βt+γ∗y+φy+ε,tt−1jt−jtj=1thetestiscarriedoutbytestingthejointhypothesisthatβ=γ∗=0.DickeyandFuller(1981)presentcounterpartstothecriticalFstatisticsfortestingthehypothesis.SomeoftheirvaluesarereproducedinthefirstrowofTable20.4.(Authorsfrequentlyfocusonγ∗andignorethetimetrend,maintainingitonlyaspartoftheappropriateformulation.Inthiscase,onemayusethesimpletestofγ∗=0asbefore,withtheDFcriticalvalues.)τThelaglength,p,remainstobedetermined.Asusual,wearewelladvisedtotestdowntotherightvalueinsteadofup.Onecantakethefamiliarapproachandsequentiallyexaminethetstatisticonthelastcoefficient—theusualttestisappropriate.Analternativeistocombineameasureofmodelfit,suchastheregressions2withoneoftheinformationcriteria.TheAkaikeandSchwartz(Bayesian)informationcriteriawouldproducethetwoinformationmeasureseeA∗IC(p)=ln+(p+K∗)T−pmax−K∗T−pmax−K∗K∗=1forrandomwalk,2forrandomwalkwithdrift,3fortrendstationaryA∗=2forAkaikecriterion,ln(T−p−K∗)forBayesiancriterionmaxpmax=thelargestlaglengthbeingconsidered.Theremainingdetailistodecideuponpmax.Thetheoryprovideslittleguidancehere.Onthebasisofalargenumberofsimulations,Schwert(1989)foundthatp=integerpartof[12×(T/100).25]maxgavegoodresults.ManyalternativestotheDickey–Fullertestshavebeensuggested,insomecasestoimproveonthefinitesamplepropertiesandinotherstoaccommodatemoregeneralmodelingframeworks.ThePhillips(1987)andPhillipsandPerron(1988)statisticmaybecomputedforthesamethreefunctionalforms,yy=δt+γyt−1+γ1yt−1+···+γpyt−p+εt(20-23)whereδtmaybe0,µ,orµ+βt.TheproceduremodifiesthetwoDickey–Fullerstatisticsweexaminedabove;c0γˆ−11TvZτ=−(a−c0)√av2as2T(γˆ−1)1T2v2Zγ=−(a−c0)1−γˆ1−···−γˆp2s222Itiseasilyverifiedthatoneoftherootsofthecharacteristicpolynomialis1/(γ1+γ2+···+γp).\nGreene-50240bookJune27,200221:11CHAPTER20✦Time-SeriesModels645whereT22t=1ets=T−Kv2=estimatedasymptoticvarianceofγˆ1Tcj=etet−s,j=0,...,p=jthautocovarianceofresidualsTs=j+1c=[(T−K)/T]s20Lja=c0+21−cj.L+1j=1(NotetheNewey–West(Bartlett)weightsinthecomputationofa.Asbefore,theanalystmustchooseL.)TheteststatisticsarereferredtothesameDickey–Fullertableswehaveusedbefore.Elliot,Rothenberg,andStock(1996)haveproposedamethodtheydenotetheADF-GLSprocedurewhichisdesignedtoaccommodatemoregeneralformulationsofε;theprocessgeneratingεtisassumedtobeanI(0)stationaryprocess,possiblyanARMA(r,s).Thenullhypothesis,asbefore,isγ=1in(20-23)whereδt=µorµ+βt.Themethodproceedsasfollows:Step1.Linearlyregressy1111∗y2−ry¯1∗1−r¯∗1−r¯2−r¯y=onX=orX=·········yT−ry¯T−11−r¯1−rT¯−r¯(T−1)fortherandomwalkwithdriftandtrendstationarycases,respectively.(Notethatthesecondcolumnofthematrixissimplyr¯+(1−r¯)t.)Computetheresidualsfromthisregression,y˜t=yt−δˆt.r¯=1−7/Tfortherandomwalkmodeland1−13.5/Tforthemodelwithatrend.Step2.TheDickey–FullerDFτtestcannowbecarriedoutusingthemodely˜y=γy˜t−1+γ1y˜t−1+···+γpy˜t−p+ηt.Ifthemodeldoesnotcontainthetimetrend,thenthetstatisticfor(γ−1)maybereferredtothecriticalvaluesinthecenterpanelofTable20.4.Forthetrendstationarymodel,thecriticalvaluesaregiveninatablepresentedinElliotetal.The97.5percentcriticalvaluesforaone-tailedtestfromtheirtableis−3.15.Asinmanysuchcasesofanewtechnique,asresearchersdeveloplargeandsmallmodificationsofthesetests,thepractitionerislikelytohavesomedifficultydecidinghowtoproceed.TheDickey–Fullerprocedureshavestoodthetestoftimeasrobusttoolsthatappeartogivegoodresultsoverawiderangeofapplications.ThePhillips–Perrontestsareverygeneral,butappeartohavelessthanoptimalsmallsampleproperties.ResearcherscontinuetoexamineitandtheotherssuchasElliotetal.method.OthertestsarecataloguedinMaddalaandKim(1998).\nGreene-50240bookJune27,200221:11646CHAPTER20✦Time-SeriesModelsExample20.6AugmentedDickey–FullerTestforaUnitRootinGDPTheDickey–Fuller1981JASApaperisaclassicintheeconometricsliterature—itisprobablythesinglemostfrequentlycitedpaperinthefield.Itseemsappropriate,therefore,torevisitatleastsomeoftheirwork.DickeyandFullerapplytheirmethodologytoamodelforthelogofaquarterlyseriesonoutput,theFederalReserveBoardProductionIndex.Themodelusedisyt=µ+βt+γyt−1+φ(yt−1−yt−2)+εt.(20-24)Thetestiscarriedoutbytestingthejointhypothesisthatbothβandγ∗arezerointhemodel∗∗yt−yt−1=µ+βt+γyt−1+φ(yt−1−yt−2)+εt.(Ifγ=0,thenµ∗willalsobyconstruction.)WewillrepeatthestudywithourdataonrealGNPfromAppendixTableF5.1usingobservations1950.1to2000.4.WewillusetheaugmentedDickey–Fullertestfirst.Thus,thefirststepistodeterminetheappropriatelaglengthfortheaugmentedregression.UsingSchwert’ssuggestion,wefindthatthemaximumlaglengthshouldbeallowedtoreachpmax={theintegerpartof12[204/100].25}=14.Thespecificationsearchusesobservations18to204,sinceasmanyas17coefficientswillbeestimatedintheequationpyt=µ+βt+γyt−1+γjyt−j+εt.j=1Inthesequenceof14regressionswithj=14,13,...,theonlystatisticallysignificantlaggeddifferenceisthefirstone,inthelastregression,soitwouldappearthatthemodelusedbyDickeyandFullerwouldbechosenonthisbasis.Thetwoinformationcriteriaproduceasimilarconclusion.Bothofthemdeclinemonotonicallyfromj=14allthewaydowntoj=1,soonthisbasis,weendthesearchwithj=1,andproceedtoanalyzeDickeyandFuller’smodel.Thelinearregressionresultsfortheequationin(20-24)areyt=0.368+0.000391t+0.952yt−1+0.36025yt−1+et,s=0.009122(0.125)(0.000138)(0.0167)(0.0647)R=0.999647.Thetwoteststatisticsare0.95166−1DFτ==−2.8920.016716and201(0.95166−1)DFc==−15.263.1−0.36025Neitherstatisticislessthantherespectivecriticalvalues,whichare−3.70and−24.5.Onthisbasis,weconclude,ashavemanyothers,thatthereisaunitrootinlogGDP.ForthePhillipsandPerronstatistic,weneedseveraladditionalintermediatestatistics.FollowingHamilton(1994,page512),wechooseL=4forthelong-runvariancecalculation.OthervaluesweneedareT=201,γˆ=0.9516613,s2=0.00008311488,v2=0.00027942647,andthefirstfiveautocovariances,c0=0.000081469,c1=−0.00000351162,c2=0.00000688053,c3=0.000000597305,andc4=−0.00000128163.Applyingthesetotheweightedsumproducesa=0.0000840722,whichisonlyaminorcorrectiontoc0.Collect-ingtheresults,weobtainthePhillips–Perronstatistics,Zτ=−2.89921andZγ=−15.44133.SincetheseareappliedtothesamecriticalvaluesintheDickey–Fullertables,wereachthesameconclusionasbefore—wedonotrejectthehypothesisofaunitrootinlogGDP.\nGreene-50240bookJune27,200221:11CHAPTER20✦Time-SeriesModels64720.3.5LONGMEMORYMODELSTheautocorrelationsofanintegratedseries[I(1)orI(2)]displaythecharacteristicpatternshowninTable20.3forthelogoftheGNPdeflator.Theyremainpersistentlyextremelyhighatlonglags.Incontrast,theautocorrelationsofastationaryprocesstyp-icallydecayatanexponentialrate,solargevaluestypicallyceasetoappearafteronlyafewlags.(See,e.g.,therightmostpanelofTable20.3.)Someprocessesappeartobehavebetweenthesetwobenchmarks;theyareclearlynonstationary,yetwhendifferenced,theyappeartoshowthecharacteristicalternatingpositiveandnegativeautocorrela-tions,stillouttolonglags,thatsuggest“overdifferencing.”Buttheundifferenceddatashowsignificantautocorrelationsouttoverylonglags.Stockreturns[Lo(1991)]andexchangerates[Cheung(1993)]providesomecasesthathavebeenstudied.Inastrik-ingexample,Ding,Granger,andEngle(1993)foundsignificantautocorrelationsouttolagsofwellover2,000daysintheabsolutevaluesofdailystockmarketreturns.[SeealsoGrangerandDing(1996).]Thereisampleevidenceofalackofmemoryinstockmarketreturns,butaspateofrecentevidence,suchasthis,hasbeenconvincingthatthevolatility—theabsolutevalueresemblesthestandarddeviation—instockreturnshasextremelylongmemory.Althoughitisclearthatanextensionofthestandardmodelsofstationarytimeseriesisneededtoexplainthepersistenceoftheeffectsofshockson,forexample,GDPandthemoneystock,andmodelsofunitrootsandcointegration(seeSection20.4)doappeartobehelpful,thereremainssomethingofastatisticalbalancingactintheirconstruction.If“theroot”differsfromoneineitherdirection,thenanaltogetherdifferentsetofstatisticaltoolsiscalledfor.Formodelsofverylongtermautocorrelation,whichlikewisereflectpersistentresponsetoshocks,modelsoflong-termmemoryhaveprovidedaveryusefulextensionoftheconceptofnonstationarity.23Thebasicbuildingblockinthisclassofmodelsisthefractionallyintegratedwhitenoiseseries,(1−L)dy=ε.ttThistimeserieshasaninfinitemoving-averagerepresentationif|d|<1,butitisnon-2stationary.Ford=0,thesequenceofautocorrelations,ρk=λk/λ0,isnotabsolutelysummable.Forthissimplemodel,(k+d)(1−d)ρk=.(k−d+1)(d)Thefirst50valuesofρkareshowninFigure20.15ford=0.1,0.25,0.4,and0.475.TheDing,Granger,andEnglecomputationsdisplayapatternsimilartothatshownfor0.25inthefigure.[SeeGrangerandDing(1996,p.66).]Thenaturalextensionofthemodelthatallowsformoreintricatepatternsinthedataistheautoregressive,fractionallyintegrated,moving-average,orARFIMA(p,d,q)model,(1−L)d[y−γy−···−γy]=ε−θε−···−θε,y=Y−µ.t1t−1pt−pt1t−1qt−qtt23Thesemodelsappeartohaveoriginatedinthephysicalsciencesearlyinthe1950s,especiallywithHurst(1951),whosenameisgiventotheeffectofverylongtermautocorrelationinobservedtimeseries.Thepio-neeringworkineconometricsisthatofTaqqu(1975),GrangerandJoyeux(1980),Granger(1981),Hosking(1981),andGewekeandPorter-Hudak(1983).Anextremelythoroughsummaryandanextensivebibliogra-phyaregiveninBaillie(1996).\nGreene-50240bookJune27,200221:11648CHAPTER20✦Time-SeriesModelsR1R25R4R4751.00.80.60.40.20.001020304050kFIGURE20.15AutocorrelationsforaFractionallyIntegratedTimeSeries.EstimationofARFIMAmodelsisdiscussedinBaillie(1996)andthereferencescitedthere.AtestforfractionalintegrationeffectsissuggestedbyGewekeandPorter-Hudak(1983).Thetestisbasedontheslopeinthelinearregressionofthelogsofthefirstn(T)valuesfromthesampleperiodogramofyt,thatis,zk=loghY(ωk),onthecorresponding2functionsofthefirstn(T)frequencies,xk=log{4sin(ωk/2)}.√Heren(T)istakentobereasonablysmall;GewekeandPorter-Hudaksuggestn(T)=T.Aconventionalttestofthehypothesisthattheslopeequalszeroisusedtotestthehypothesis.Example20.7Long-TermMemoryintheGrowthofRealGNPFortherealGDPseriesanalyzedinExample20.6,weanalyzethesubseries1950.3to1983.4,withT=135,sowetaken(T)=12.Thefrequenciesusedfortheperiodogramare2πk/135,k=1,...,12.Thefirst12valuesfromtheperiodogramare[0.05104,0.4322,0.7227,0.3659,1.353,1.257,0.05533,1.388,0.5955,0.2043,0.3040,0.6381].Thelinearregressionproducesanestimateofdof0.2505withastandarderrorof0.225.Thus,thehypothesisthatdequalszerocannotberejected.Thisresultisnotsurprising;thefirstsevenautocorrelationsoftheseriesare0.491,0.281,0.044,−0.076,−0.120,−0.052,and0.018.Theyaretrivialthereafter,suggestingthattheseriesis,infact,stationary.Thisassumption,initself,createssomethingofanambiguity.ThelogoftherealGNPseriesdoesindeedappeartobeI(1).ButthepricelevelusedtocomputerealGNPisfairlyconvincinglyI(2),oratleastI(1+d)forsomedgreaterthanzero,judgingfromFigure20.7.Assuch,thelogofrealGNPisthelogofavariablethatisprobablyatleastI(1+d).Althoughreceivedresultsarenotdefinitive,thisresultisprobablynotI(1).Modelsoflong-termmemoryhavebeenextendedinmanydirections,andtheresultshavebeenfullyintegratedwiththeunitrootplatformdiscussedearlier.Baillie’ssurveydetailsmanyoftherecentlydevelopedmethods.\nGreene-50240bookJune27,200221:11CHAPTER20✦Time-SeriesModels649Example20.8Long-TermMemoryinForeignExchangeMarketsCheung(1993)appliedthelong-termmemorymodeltoastudyofendofweekexchangeratesfor16years,1974to1989.Thetime-seriesstudiedwerethedollarspotratesoftheBritishpound(BP),Deutschemark(DM),Swissfranc(SF),Frenchfranc(FF),andJapaneseyen(JY).Testingandestimationweredoneusingthe1974to1987data.Thefinal2yearsofthesamplewereheldoutforoutofsampleforecasting.Datawereanalyzedintheformoffirstdifferencesofthelogssothatobservationsareweek-to-weekpercentagechanges.Plotsofthedatadidnotsuggestanyobviousdeviationfromstationarity.Asaninitialassessment,theundifferenceddataweresubjectedtoaug-mentedDickey–Fullertestsforunitrootsandthehypothesiscouldnotberejected.Thus,√analysisproceededusingthefirstdifferencesofthelogs.TheGPHtestusingn(T)=Tforlongmemoryinthefirstdifferencesproducedthefollowingestimatesofd,withestimated“pvalues”inparentheses.(ThepvalueisthestandardnormalprobabilitythatN[0,1]isgreaterthanorequaltotheratiooftheestimateddtoitsestimatedstandarderror.Thesetestsareone-sidedtests.Valueslessthan0.05indicatestatisticalsignificancebytheusualconventions.)CurrencyBPDMSFJYFFd0.18690.29430.28700.29070.4240pvalue(0.106)(0.025)(0.028)(0.026)(0.003)Theunitroothypothesisisrejectedinfavorofthelongmemorymodelinfourofthefivecases.TheauthorproceededtoestimateARFIMA(p,d,q)models.ThecoefficientsoftheARFIMAmodels(disrecomputed)aresmallinallcasessavefortheFrenchfranc,forwhichtheestimatedmodelis0.3664(1−L)[(FFt−FF)−0.4776(FFt−1−FF)−0.1227(FFt−2−FF)]=et+0.8657et−1.20.4COINTEGRATIONStudiesinempiricalmacroeconomicsalmostalwaysinvolvenonstationaryandtrendingvariables,suchasincome,consumption,moneydemand,thepricelevel,tradeflows,andexchangerates.Accumulatedwisdomandtheresultsoftheprevioussectionssuggestthattheappropriatewaytomanipulatesuchseriesistousedifferencingandothertransformations(suchasseasonaladjustment)toreducethemtostationarityandthentoanalyzetheresultingseriesasVARsorwiththemethodsofBoxandJenkins.Butrecentresearchandagrowingliteraturehasshownthattherearemoreinteresting,appropriatewaystoanalyzetrendingvariables.Inthefullyspecifiedregressionmodelyt=βxt+εt,thereisapresumptionthatthedisturbancesεareastationary,whitenoiseseries.24Buttthispresumptionisunlikelytobetrueifytandxtareintegratedseries.Generally,iftwoseriesareintegratedtodifferentorders,thenlinearcombinationsofthemwillbeintegratedtothehigherofthetwoorders.Thus,ifytandxtareI(1)—thatis,ifbotharetrendingvariables—thenwewouldnormallyexpectyt−βxttobeI(1)regardlessofthevalueofβ,notI(0)(i.e.,notstationary).Ifytandxtareeachdriftingupward24Ifthereisautocorrelationinthemodel,thenithasbeenremovedthroughanappropriatetransformation.\nGreene-50240bookJune27,200221:11650CHAPTER20✦Time-SeriesModelswiththeirowntrend,thenunlessthereissomerelationshipbetweenthosetrends,thedifferencebetweenthemshouldalsobegrowing,withyetanothertrend.Theremustbesomekindofinconsistencyinthemodel.Ontheotherhand,ifthetwoseriesarebothI(1),thentheremaybeaβsuchthatεt=yt−βxtisI(0).Intuitively,ifthetwoseriesarebothI(1),thenthispartialdifferencebetweenthemmightbestablearoundafixedmean.Theimplicationwouldbethattheseriesaredriftingtogetheratroughlythesamerate.Twoseriesthatsatisfythisrequirementaresaidtobecointegrated,andthevector[1,−β](oranymultipleofit)isacointegratingvector.Insuchacase,wecandistinguishbetweenalong-runrelationshipbetweenytandxt,thatis,themannerinwhichthetwovariablesdriftupwardtogether,andtheshort-rundynamics,thatis,therelationshipbetweendeviationsofytfromitslong-runtrendanddeviationsofxtfromitslong-runtrend.Ifthisisthecase,thendifferencingofthedatawouldbecounterproductive,sinceitwouldobscurethelong-runrelationshipbetweenytandxt.Studiesofcointegrationandarelatedtechnique,errorcorrection,areconcernedwithmethodsofestimationthatpreservetheinformationaboutbothformsofcovariation.25Example20.9CointegrationinConsumptionandOutputConsumptionandincomeprovideoneofthemorefamiliarexamplesofthephenomenondescribedabove.ThelogsofGDPandconsumptionfor1950.1to2000.4areplottedinFig-ure20.16.Bothvariablesareobviouslynonstationary.Wehavealreadyverifiedthatthereisaunitrootintheincomedata.WeleaveasanexerciseforthereadertoverifythatconsumptionvariableislikewiseI(1).Nonetheless,thereisaclearrelationshipbetweenconsumptionandoutput.Toseewherethisdiscussionofrelationshipsamongvariablesisgoing,considerasimpleregressionofthelogofconsumptiononthelogofincome,wherebothvariablesaremanipulatedinmeandeviationform(so,theregressionincludesaconstant).Theslopeinthatregressionis1.056765.Theresidualsfromtheregression,u=[lnCons∗,lnGDP∗][1,−1.056765](wherethe“∗”indicatesmeandeviations)areplot-ttedinFigure20.17.Thetrendisclearlyabsentfromtheresiduals.But,itremainstoverifywhethertheseriesofresidualsisstationary.IntheADFregressionoftheleastsquaresresid-ualsonaconstant(randomwalkwithdrift),thelaggedvalueandthelaggedfirstdifference,thecoefficientonut−1is0.838488(0.0370205)andthatonut−1−ut−2is−0.098522.(TheconstantdifferstriviallyfromzerobecausetwoobservationsarelostincomputingtheADFregression.)With202observations,wefindDFτ=−4.63andDFγ=−29.55.Botharewellbelowthecriticalvalues,whichsuggeststhattheresidualseriesdoesnotcontainaunitroot.Weconclude(atleastitappearsso)thatevenafteraccountingforthetrend,althoughneitheroftheoriginalvariablesisstationary,thereisalinearcombinationofthemthatis.Ifthisconclusionholdsupafteramoreformaltreatmentofthetestingprocedure,wewillstatethatlogGDPandlogconsumptionarecointegrated.Example20.10SeveralCointegratedSeriesThetheoryofpurchasingpowerparityspecifiesthatinlong-runequilibrium,exchangerateswilladjusttoerasedifferencesinpurchasingpoweracrossdifferenteconomies.Thus,ifp1andp0arethepricelevelsintwocountriesandEistheexchangeratebetweenthetwocurrencies,theninequilibrium,p1tvt=Et=µ,aconstant.p0t25See,forexample,EngleandGranger(1987)andthelengthyliteraturecitedinHamilton(1994).AsurveypaperonVARsandcointegrationisWatson(1994).\nGreene-50240bookJune27,200221:11CHAPTER20✦Time-SeriesModels651CointegratedVariables:LogsofGDPandConsumption9.6LOGGDPLOGCONS9.08.4Logs7.87.26.619491962197519882001QuarterFIGURE20.16LogsofConsumptionandGDP.ResidualsfromConsumption–IncomeRegression.075.050.025Residual.000.025.05019501963197619892002QuarterFIGURE20.17RegressionResiduals.\nGreene-50240bookJune27,200221:11652CHAPTER20✦Time-SeriesModelsThepricelevelsinanytwocountriesarelikelytobestronglytrended.Butallowingforshort-termdeviationsfromequilibrium,thetheorysuggeststhatforaparticularβ=(lnµ,−1,1),inthemodellnEt=β1+β2lnp1t+β3lnp0t+εt,εt=lnvtwouldbeastationaryseries,whichwouldimplythatthelogsofthethreevariablesinthemodelarecointegrated.WesupposethatthemodelinvolvesMvariables,y=[y,...,y],whichindi-t1tMtviduallymaybeI(0)orI(1),andalong-runequilibriumrelationship,yγ−xβ=0.ttThe“regressors”mayincludeaconstant,exogenousvariablesassumedtobeI(0),and/oratimetrend.Thevectorofparametersγisthecointegratingvector.Intheshortrun,thesystemmaydeviatefromitsequilibrium,sotherelationshipisrewrittenasyγ−xβ=ε,tttwheretheequilibriumerrorεtmustbeastationaryseries.Infact,sincethereareMvariablesinthesystem,atleastinprinciple,therecouldbemorethanonecointegratingvector.InasystemofMvariables,therecanonlybeuptoM−1linearlyindependentcointegratingvectors.Aproofofthispropositionisverysimple,butusefulatthispoint.Proof:SupposethatγiisacointegratingvectorandthatthereareMlinearlyindependentcointegratingvectors.Then,neglectingxβforthemoment,forteveryγ,yγisastationaryseriesν.Anylinearcombinationofasetofititistationaryseriesisstationary,soitfollowsthateverylinearcombinationofthecointegratingvectorsisalsoacointegratingvector.IfthereareMsuchM×1linearlyindependentvectors,thentheyformabasisfortheM-dimensionalspace,soanyM×1vectorcanbeformedfromthesecointegratingvectors,includingthecolumnsofanM×Midentitymatrix.Thus,thefirstcolumnofanidentitymatrixwouldbeacointegratingvector,oryt1isI(0).Thisresultisacontradiction,sinceweareallowingyt1tobeI(1).ItfollowsthattherecanbeatmostM−1cointegratingvectors.Thenumberoflinearlyindependentcointegratingvectorsthatexistintheequilib-riumsystemiscalleditscointegratingrank.Thecointegratingrankmayrangefrom1toM−1.Ifitexceedsone,thenwewillencounteraninterestingidentificationproblem.Asaconsequenceoftheobservationintheprecedingproof,wehavetheunfortunateresultthat,ingeneral,ifthecointegratingrankofasystemexceedsone,thenwithoutout-of-sample,exactinformation,itisnotpossibletoestimatebehavioralrelationshipsascointegratingvectors.Enders(1995)providesausefulexample.Example20.11MultipleCointegratingVectorsWeconsiderthelogsoffourvariables,moneydemandm,thepricelevelp,realincomey,andaninterestrater.Thebasicrelationshipism=γ0+γ1p+γ2y+γ3r+ε.ThepricelevelandrealincomeareassumedtobeI(1).Theexistenceoflong-runequilibriuminthemoneymarketimpliesacointegratingvectorα1.IftheFedfollowsacertainfeedbackrule,increasingthemoneystockwhennominalincome(y+p)islowanddecreasingitwhen\nGreene-50240bookJune27,200221:11CHAPTER20✦Time-SeriesModels653nominalincomeishigh—whichmightmakemoresenseintermsofratesofgrowth—thenthereisasecondcointegratingvectorinwhichγ1=γ2andγ3=0.Supposethatwelabelthisvectorα2.Theparametersinthemoneydemandequation,notablytheinterestelasticity,areinterestingquantities,andwemightseektoestimateα1tolearnthevalueofthisquantity.Butsinceeverylinearcombinationofα1andα2isacointegratingvector,tothispointweareonlyabletoestimateahashofthetwocointegratingvectors.Infact,theparametersofthismodelareidentifiablefromsampleinformation(inprinciple).Wehavespecifiedtwocointegratingvectors,γ1=[1,−γ10,−γ11,−γ12,−γ13]andγ2=[1,−γ20,γ21,γ21,0].Althoughitistruethateverylinearcombinationofγ1andγ2isacointegratingvector,onlytheoriginaltwovectors,astheyare,haveonesinthefirstpositionofbothanda0inthelastpositionofthesecond.(Theequalityrestrictionactuallyoveridentifiestheparametermatrix.)Thisresultis,ofcourse,exactlythesortofanalysisthatweusedinestablishingtheidentifiabilityofasimultaneous-equationssystem.20.4.1COMMONTRENDSIftwoI(1)variablesarecointegrated,thensomelinearcombinationofthemisI(0).Intuitionshouldsuggestthatthelinearcombinationdoesnotmysteriouslycreateawell-behavednewvariable;rather,somethingpresentintheoriginalvariablesmustbemissingfromtheaggregatedone.Consideranexample.SupposethattwoI(1)variableshavealineartrend,y1t=α+βt+ut,y2t=γ+δt+vt,whereutandvtarewhitenoise.Alinearcombinationofy1tandy2twithvector(1,θ)producesthenewvariable,zt=(α+θγ)+(β+θδ)t+ut+θvt,which,ingeneral,isstillI(1).Infact,theonlywaytheztseriescanbemadestationaryisifθ=−β/δ.Ifso,thentheeffectofcombiningthetwovariableslinearlyistoremovethecommonlineartrend,whichisthebasisofStockandWatson’s(1988)analysisoftheproblem.Buttheirobservationgoesanimportantstepbeyondthisone.Theonlywaythaty1tandy2tcanbecointegratedtobeginwithisiftheyhaveacommontrendofsomesort.Tocontinue,supposethatinsteadofthelineartrendt,thetermsontheright-handside,y1andy2,arefunctionsofarandomwalk,wt=wt−1+ηt,whereηtiswhitenoise.Theanalysisisidentical.Butnowsupposethateachvariableyithasitsownrandomwalkcomponentwit,i=1,2.Anylinearcombinationofy1tandy2tmustinvolvebothrandomwalks.Itisclearthattheycannotbecointegratedunless,infact,w1t=w2t.Thatis,onceagain,theymusthaveacommontrend.Finally,supposethaty1tandy2tsharetwocommontrends,y1t=α+βt+λwt+ut,y2t=γ+δt+πwt+vt.\nGreene-50240bookJune27,200221:11654CHAPTER20✦Time-SeriesModelsWeplacenorestrictiononλandπ.Then,abitofmanipulationwillshowthatitisnotpossibletofindalinearcombinationofy1tandy2tthatiscointegrated,eventhoughtheysharecommontrends.Theendresultforthisexampleisthatify1tandy2tarecointegrated,thentheymustshareexactlyonecommontrend.AsStockandWatsondetermined,theprecedingisthecruxofthecointegrationofeconomicvariables.AsetofMvariablesthatarecointegratedcanbewrittenasastationarycomponentpluslinearcombinationsofasmallersetofcommontrends.Ifthecointegratingrankofthesystemisr,thentherecanbeuptoM−rlineartrendsandM−rcommonrandomwalks.[SeeHamilton(1994,p.578).](Thetwo-variablecaseisspecial.Inatwo-variablesystem,therecanbeonlyonecommontrendintotal.)Theeffectofthecointegrationistopurgethesecommontrendsfromtheresultantvariables.20.4.2ERRORCORRECTIONANDVARREPRESENTATIONSSupposethatthetwoI(1)variablesytandztarecointegratedandthatthecointegratingvectoris[1,−θ].Thenallthreevariablesyt=yt−yt−1,zt,and(yt−θzt)areI(0).Theerrorcorrectionmodely=xβ+γ(z)+λ(y−θz)+εtttt−1t−1tdescribesthevariationinytarounditslong-runtrendintermsofasetofI(0)exogenousfactorsxt,thevariationofztarounditslong-runtrend,andtheerrorcorrection(yt−θzt),whichistheequilibriumerrorinthemodelofcointegration.Thereisatightconnectionbetweenmodelsofcointegrationandmodelsoferrorcorrection.Themodelinthisformisreasonableasitstands,butinfact,itisonlyinternallyconsistentifthetwovariablesarecointegrated.Ifnot,thenthethirdterm,andhencetheright-handside,cannotbeI(0),eventhoughtheleft-handsidemustbe.Theupshotisthatthesameassumptionthatwemaketoproducethecointegrationimplies(andisimpliedby)theexistenceofanerrorcorrectionmodel.26Aswewillexamineinthenextsection,theutilityofthisrepresentationisthatitsuggestsawaytobuildanelaboratemodelofthelong-runvariationinytaswellasatestforcointegration.Lookingahead,theprecedingsuggeststhatresidualsfromanestimatedcointegrationmodel—thatis,estimatedequilibriumerrors—canbeincludedinanelaboratemodelofthelong-runcovariationofytandzt.Onceagain,wehavethefoundationofEngelandGranger’sapproachtoanalyzingcointegration.ConsidertheVARrepresentationofthemodelyt=yt−1+εt,wherethevectoryis[y,z].Nowtakefirstdifferencestoobtaintttyt−yt−1=(−I)yt−1+εtoryt=yt−1+εt.IfallvariablesareI(1),thenallMvariablesontheleft-handsideareI(0).Whetherthoseontheright-handsideareI(0)remainstobeseen.Thematrixproduceslinear26TheresultinitsgeneralformisknownastheGrangerrepresentationtheorem.SeeHamilton(1994,p.582).\nGreene-50240bookJune27,200221:11CHAPTER20✦Time-SeriesModels655combinationsofthevariablesinyt.Butaswehaveseen,notalllinearcombinationscanbecointegrated.Thenumberofsuchindependentlinearcombinationsisr0,y=0ify∗≤0.Inthisformulation,xβiscalledtheindexfunction.Twoaspectsofthisconstructionmeritourattention.First,theassumptionofknownvarianceofεisaninnocentnormalization.Supposethevarianceofεisscaledbyanunrestrictedparameterσ2.Thelatentregressionwillbey∗=xβ+σε.But,(y∗/σ)=x(β/σ)+εisthesamemodelwiththesamedata.Theobserveddatawillbeunchanged;yisstill0or1,dependingonlyonthesignofy∗notonitsscale.Thismeansthatthereisnoinformationaboutσinthedatasoitcannotbeestimated.Second,theassumptionofzeroforthethresholdislikewiseinnocentifthemodelcontainsaconstantterm(andnotifitdoesnot).4Letabethesupposednonzerothresholdandαbeanunknownconstanttermand,forthepresent,xandβcontaintherestoftheindexnotincludingtheconstantterm.Then,theprobabilitythatyequalsoneisProb(y∗>a|x)=Prob(α+xβ+ε>a|x)=Prob[(α−a)+xβ+ε>0|x].Sinceαisunknown,thedifference(α−a)remainsanunknownparameter.Withthetwonormalizations,Prob(y∗>0|x)=Prob(ε>−xβ|x).Ifthedistributionissymmetric,asarethenormalandlogistic,thenProb(y∗>0|x)=Prob(εUband0ifUa≤Ub.Acommonformulationisthelinearrandomutilitymodel,Ua=xβ+εandUb=xβ+ε.(21-13)aabbThen,ifwedenotebyY=1theconsumer’schoiceofalternativea,wehaveProb[Y=1|x]=Prob[Ua>Ub]=Prob[xβ+ε−xβ−ε>0|x]aabb(21-14)=Prob[x(β−β)+ε−ε>0|x]abab=Prob[xβ+ε>0|x]onceagain.21.4ESTIMATIONANDINFERENCEINBINARYCHOICEMODELSWiththeexceptionofthelinearprobabilitymodel,estimationofbinarychoicemodelsisusuallybasedonthemethodofmaximumlikelihood.EachobservationistreatedasasingledrawfromaBernoullidistribution(binomialwithonedraw).ThemodelwithsuccessprobabilityF(xβ)andindependentobservationsleadstothejointprobability,\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice671orlikelihoodfunction,Prob(Y=y,Y=y,...,Y=y|X)=[1−F(xβ)]F(xβ),(21-15)1122nniiyi=0yi=1whereXdenotes[xi]i=1,...,n.ThelikelihoodfunctionforasampleofnobservationscanbeconvenientlywrittenasnL(β|data)=[F(xβ)]yi[1−F(xβ)]1−yi.(21-16)iii=1Takinglogs,weobtainnlnL=ylnF(xβ)+(1−y)ln[1−F(xβ)].6(21-17)iiiii=1Thelikelihoodequationsaren∂lnLyifi−fi=+(1−yi)xi=0(21-18)∂βFi(1−Fi)i=1wherefisthedensity,dF/d(xβ).[In(21-18)andlater,wewillusethesubscriptitoiiiindicatethatthefunctionhasanargumentxβ.]ThechoiceofaparticularformforFiileadstotheempiricalmodel.Unlessweareusingthelinearprobabilitymodel,thelikelihoodequationsin(21-18)willbenonlinearandrequireaniterativesolution.Allofthemodelswehaveseenthusfararerelativelystraightforwardtoanalyze.Forthelogitmodel,byinserting(21-7)and(21-11)in(21-18),weget,afterabitofmanipulation,thelikelihoodequations∂lnLn=(yi−i)xi=0.(21-19)∂βi=1Notethatifxicontainsaconstantterm,thefirst-orderconditionsimplythattheaverageofthepredictedprobabilitiesmustequaltheproportionofonesinthesample.7Thisimplicationalsobearssomesimilaritytotheleastsquaresnormalequationsifweviewthetermy−asaresidual.8Forthenormaldistribution,thelog-likelihoodisiilnL=ln[1−(xβ)]+ln(xβ).(21-20)iiyi=0yi=1Thefirst-orderconditionsformaximizingLare∂lnL−φiφi01=xi+xi=λixi+λixi.∂β1−iiyi=0yi=1yi=0yi=16Ifthedistributionissymmetric,asthenormalandlogisticare,then1−F(xβ)=F(−xβ).Thereisafurthersimplification.Letq=2y−1.ThenlnL=ilnF(qixiβ).See(21-21).7Thesameresultholdsforthelinearprobabilitymodel.Althoughregularlyobservedinpractice,theresulthasnotbeenverifiedfortheprobitmodel.8Thissortofconstructionarisesinmanymodels.Thefirstderivativeofthelog-likelihoodwithrespecttotheconstanttermproducesthegeneralizedresidualinmanysettings.See,forexample,Chesher,Lancaster,andIrish(1985)andtheequivalentresultforthetobitmodelinSection20.3.5.\nGreene-50240bookJune27,200222:39672CHAPTER21✦ModelsforDiscreteChoiceUsingthedevicesuggestedinfootnote6,wecanreducethistonn∂logLqiφ(qixiβ)=xi=λixi=0.(21-21)∂β(qixiβ)i=1i=1whereqi=2yi−1.Theactualsecondderivativesforthelogitmodelarequitesimple:∂2lnLH==−(1−)xx.(21-22)iiii∂β∂βiSincethesecondderivativesdonotinvolvetherandomvariableyi,Newton’smethodisalsothemethodofscoringforthelogitmodel.NotethattheHessianisalwaysnegativedefinite,sothelog-likelihoodisgloballyconcave.Newton’smethodwillusuallyconvergetothemaximumofthelog-likelihoodinjustafewiterationsunlessthedataareespeciallybadlyconditioned.Thecomputationisslightlymoreinvolvedfortheprobitmodel.Ausefulsimplificationisobtainedbyusingthevariableλ(yi,βxi)=λithatisdefinedin(21-21).Thesecondderivativescanbeobtainedusingtheresultthatforanyz,dφ(z)/dz=−zφ(z).Then,fortheprobitmodel,∂2lnLnH==−λ(λ+xβ)xx.(21-23)iiiii∂β∂βi=1Thismatrixisalsonegativedefiniteforallvaluesofβ.Theproofislessobviousthanforthelogitmodel.9ItsufficestonotethatthescalarpartinthesummationisVar[ε|ε≤βx]−1wheny=1andVar[ε|ε≥−βx]−1wheny=0.Theunconditionalvarianceisone.Sincetruncationalwaysreducesvariance—seeTheorem22.3—inbothcases,thevarianceisbetweenzeroandone,sothevalueisnegative.10TheasymptoticcovariancematrixforthemaximumlikelihoodestimatorcanbeestimatedbyusingtheinverseoftheHessianevaluatedatthemaximumlikelihoodestimates.Therearealsotwootherestimatorsavailable.TheBerndt,Hall,Hall,andHausmanestimator[see(17-18)andExample17.4]wouldbenB=g2xx,iiii=1wheregi=(yi−i)forthelogitmodel[see(21-19)]andgi=λifortheprobitmodel[see(21-21)].ThethirdestimatorwouldbebasedontheexpectedvalueoftheHessian.Aswesawearlier,theHessianforthelogitmodeldoesnotinvolveyi,soH=E[H].Butbecauseλiisafunctionofyi[see(21-21)],thisresultisnottruefortheprobitmodel.Amemiya(1981)showedthatfortheprobitmodel,2n∂lnLE=λλxx.(21-24)0ii1ii∂β∂βprobiti=1Onceagain,thescalarpartoftheexpressionisalwaysnegative[see(21-23)andnotethatλ0iisalwaysnegativeandλi1isalwayspositive].Theestimatoroftheasymptotic9See,forexample,Amemiya(1985,pp.273–274)andMaddala(1983,p.63).10SeeJohnsonandKotz(1993)andHeckman(1979).WewillmakerepeateduseofthisresultinChapter22.\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice673covariancematrixforthemaximumlikelihoodestimatoristhenthenegativeinverseofwhatevermatrixisusedtoestimatetheexpectedHessian.SincetheactualHessianisgenerallyusedfortheiterations,thisoptionistheusualchoice.Asweshallseebelow,though,forcertainhypothesistests,theBHHHestimatorisamoreconvenientchoice.Insomestudies[e.g.,Boyes,Hoffman,andLow(1989),Greene(1992)],themixofonesandzerosintheobservedsampleofthedependentvariableisdeliberatelyskewedinfavorofoneoutcomeortheothertoachieveamorebalancedsamplethanrandomsamplingwouldproduce.Thesamplingissaidtobechoicebased.Inthestudiesnoted,thedependentvariablemeasuredtheoccurrenceofloandefault,whichisarelativelyuncommonoccurrence.Toenrichthesample,observationswithy=1(default)wereoversampled.Intuitionshouldsuggest(correctly)thatthebiasinthesampleshouldbetransmittedtotheparameterestimates,whichwillbeestimatedsoastomimicthesample,notthepopulation,whichisknowntobedifferent.ManskiandLerman(1977)derivedtheweightedendogenoussamplingmaximumlikelihood(WESML)estimatorforthissituation.Theestimatorrequiresthatthetruepopulationproportions,ω1andω0,beknown.Letp1andp0bethesampleproportionsofonesandzeros.Thentheestimatorisobtainedbymaximizingaweightedlog-likelihood,nlnL=wilnF(qiβxi),i=1wherewi=yi(ω1/p1)+(1−yi)(ω0/p0).Notethatwitakesonlytwodifferentvalues.ThederivativesandtheHessianarelikewiseweighted.Afinalcorrectionisneededafterestimation;theappropriateestimatoroftheasymptoticcovariancematrixisthesandwichestimatordiscussedinthenextsection,H−1BH−1(withweightedBandH),insteadofBorHalone.(TheweightsarenotsquaredincomputingB.)1121.4.1ROBUSTCOVARIANCEMATRIXESTIMATIONTheprobitmaximumlikelihoodestimatorisoftenlabeledaquasi-maximumlikeli-hoodestimator(QMLE)inviewofthepossibilitythatthenormalprobabilitymodelmightbemisspecified.White’s(1982a)robust“sandwich”estimatorfortheasymptoticcovariancematrixoftheQMLE(seeSection17.9fordiscussion),Est.Asy.Var[βˆ]=[Hˆ]−1Bˆ[Hˆ]−1,hasbeenusedinanumberofrecentstudiesbasedontheprobitmodel[e.g.,FernandezandRodriguez-Poo(1997),Horowitz(1993),andBlundell,Laisney,andLechner(1993)].Iftheprobitmodeliscorrectlyspecified,thenplim(1/n)Bˆ=plim(1/n)(−Hˆ)andeithersinglematrixwillsuffice,sotherobustnessissueismoot(ofcourse).Ontheotherhand,theprobit(Q-)maximumlikelihoodestimatorisnotconsistentinthepres-enceofanyformofheteroscedasticity,unmeasuredheterogeneity,omittedvariables(eveniftheyareorthogonaltotheincludedones),nonlinearityofthefunctionalformoftheindex,oranerrorinthedistributionalassumption[withsomenarrowexceptions11WESMLandthechoice-basedsamplingestimatorarenotthefreelunchtheymayappeartobe.Thatwhichthebiasedsamplingdoes,theweightingundoes.Itiscommonfortheendresulttobeverylargestandarderrors,whichmightbeviewedasunfortunate,insofarasthepurposeofthebiasedsamplingwastobalancethedatapreciselytoavoidthisproblem.\nGreene-50240bookJune27,200222:39674CHAPTER21✦ModelsforDiscreteChoiceasdescribedbyRuud(1986)].Thus,inalmostanycase,thesandwichestimatorpro-videsanappropriateasymptoticcovariancematrixforanestimatorthatisbiasedinanunknowndirection.Whiteraisesthisissueexplicitly,althoughitseemstoreceivelittleattentionintheliterature:“itistheconsistencyoftheQMLEfortheparametersofinterestinawiderangeofsituationswhichinsuresitsusefulnessasthebasisforrobustestimationtechniques”(1982a,p.4).Hisveryusefulresultisthatifthequasi-maximumlikelihoodestimatorconvergestoaprobabilitylimit,thenthesandwichestimatorcan,undercertaincircumstances,beusedtoestimatetheasymptoticcovariancematrixofthatestimator.ButthereisnoguaranteethattheQMLEwillconvergetoanythinginterestingoruseful.Simplycomputingarobustcovariancematrixforanotherwiseinconsistentestimatordoesnotgiveitredemption.Consequently,thevirtueofarobustcovariancematrixinthissettingisunclear.21.4.2MARGINALEFFECTSThepredictedprobabilities,F(xβˆ)=Fˆandtheestimatedmarginaleffectsf(xβˆ)×βˆ=fˆβˆarenonlinearfunctionsoftheparameterestimates.Tocomputestandarderrors,wecanusethelinearapproximationapproach(deltamethod)discussedinSection5.2.4.Forthepredictedprobabilities,Asy.Var[Fˆ]=[∂Fˆ/∂βˆ]V[∂Fˆ/∂βˆ],whereV=Asy.Var[βˆ].Theestimatedasymptoticcovariancematrixofβˆcanbeanyofthethreedescribedearlier.Letz=xβˆ.Thenthederivativevectoris[∂Fˆ/∂βˆ]=[dFˆ/dz][∂z/∂βˆ]=fˆx.CombiningtermsgivesAsy.Var[Fˆ]=fˆ2xVx,whichdepends,ofcourse,ontheparticularxvectorused.Thisresultsisusefulwhenamarginaleffectiscomputedforadummyvariable.Inthatcase,theestimatedeffectisFˆ=Fˆ|d=1−Fˆ|d=0.(21-25)TheasymptoticvariancewouldbeAsy.Var[Fˆ]=[∂Fˆ/∂βˆ]V[∂Fˆ/∂βˆ],where(21-26)x¯(d)x¯(d)[∂Fˆ/∂βˆ]=fˆ1−fˆ0.10Fortheothermarginaleffects,letγˆ=fˆβˆ.Then∂γˆ∂γˆAsy.Var[γˆ]=V.∂βˆ∂βˆ\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice675TABLE21.1EstimatedProbabilityModelsLinearLogisticProbitWeibullVariableCoefficientSlopeCoefficientSlopeCoefficientSlopeCoefficientSlopeConstant−1.498—−13.021—−7.452—−10.631—GPA0.4640.4642.8260.5341.6260.5332.2930.477TUCE0.0100.0100.0950.0180.0520.0170.0410.009PSI0.3790.3792.3790.4991.4260.4681.5620.325f(x¯βˆ)1.0000.1890.3280.208Thematrixofderivativesis∂βˆdfˆ∂zdfˆfˆ+βˆ=fˆI+βˆx.∂βˆdz∂βˆdzFortheprobitmodel,df/dz=−zφ,soAsy.Var[γˆ]=φ2[I−(βx)βx]V[I−(βx)βx].Forthelogitmodel,fˆ=(ˆ1−)ˆ,sodfˆdˆ=(1−2)ˆ=(1−2)ˆ(ˆ1−).ˆdzdzCollectingterms,weobtainAsy.Var[γˆ]=[(1−)]2[I+(1−2)βx]V[I+(1−2)xβ].Asbefore,thevalueobtainedwilldependonthexvectorused.Example21.3ProbabilityModelsThedatalistedinAppendixTableF21.1weretakenfromastudybySpectorandMazzeo(1980),whichexaminedwhetheranewmethodofteachingeconomics,thePersonalizedSystemofInstruction(PSI),significantlyinfluencedperformanceinlatereconomicscourses.The“dependentvariable”usedinourapplicationisGRADE,whichindicatesthewhetherastudent’sgradeinanintermediatemacroeconomicscoursewashigherthanthatintheprinciplescourse.TheothervariablesareGPA,theirgradepointaverage;TUCE,thescoreonapretestthatindicatesenteringknowledgeofthematerial;andPSI,thebinaryvariableindicatorofwhetherthestudentwasexposedtothenewteachingmethod.(SpectorandMazzeo’sspecificequationwassomewhatdifferentfromtheoneestimatedhere.)Table21.1presentsfoursetsofparameterestimates.Theslopeparametersandderiva-tiveswerecomputedforfourprobabilitymodels:linear,probit,logit,andWeibull.Thelastthreesetsofestimatesarecomputedbymaximizingtheappropriatelog-likelihoodfunction.Estimationisdiscussedinthenextsection,sostandarderrorsarenotpresentedhere.Thescalefactorgiveninthelastrowisthedensityfunctionevaluatedatthemeansofthevari-ables.Also,notethattheslopegivenforPSIisthederivative,notthechangeinthefunctionwithPSIchangedfromzerotoonewithothervariablesheldconstant.Ifonelookedonlyatthecoefficientestimates,thenitwouldbenaturaltoconcludethatthefourmodelshadproducedradicallydifferentestimates.Butacomparisonofthecolumnsofslopesshowsthatthisconclusionisclearlywrong.Themodelsareverysimilar;infact,thelogitandprobitmodelsresultsarenearlyidentical.Thedatausedinthisexampleareonlymoderatelyunbalancedbetween0sand1sforthedependentvariable(21and11).Assuch,wemightexpectsimilarresultsfortheprobit\nGreene-50240bookJune27,200222:39676CHAPTER21✦ModelsforDiscreteChoiceandlogitmodels.12Oneindicatorisacomparisonofthecoefficients.Inviewofthedifferentvariancesofthedistributions,oneforthenormalandπ2/3forthelogistic,wemightexpectto√obtaincomparableestimatesbymultiplyingtheprobitcoefficientsbyπ/3≈1.8.Amemiya(1981)found,throughtrialanderror,thatscalingby1.6insteadproducedbetterresults.Thisproportionalityresultisfrequentlycited.Theresultin(21-9)mayhelptoexplainthefinding.Theindexxβisnottherandomvariable.(SeeSection21.3.2.)Themarginaleffectintheprobitmodelfor,say,xisφ(xβ)β,whereasthatforthelogitis(1−)β.(Thesubscriptspkppklkandlareforprobitandlogit.)Amemiyasuggeststhathisapproximationworksbestatthecenterofthedistribution,whereF=0.5,orxβ=0foreitherdistribution.Supposeitis.Thenφ(0)=0.3989and(0)[1−(0)]=0.25.Ifthemarginaleffectsaretobethesame,then0.3989βpk=0.25βlk,orβlk=1.6βpk,whichistheregularityobservedbyAmemiya.Note,though,thataswedepartfromthecenterofthedistribution,therelationshipwillmoveawayfrom1.6.Sincethelogisticdensitydescendsmoreslowlythanthenormal,forunbalancedsamplessuchasours,theratioofthelogitcoefficientstotheprobitcoefficientswilltendtobelargerthan1.6.TheratiosfortheonesinTable21.1arecloserto1.7than1.6.Thecomputationofthederivativesoftheconditionalmeanfunctionisusefulwhenthevari-ableinquestioniscontinuousandoftenproducesareasonableapproximationforadummyvariable.AnotherwaytoanalyzetheeffectofadummyvariableonthewholedistributionistocomputeProb(Y=1)overtherangeofxβ(usingthesampleestimates)andwiththetwovaluesofthebinaryvariable.UsingthecoefficientsfromtheprobitmodelinTable21.1,wehavethefollowingprobabilitiesasafunctionofGPA,atthemeanofTUCE:PSI=0:Prob(GRADE=1)=[−7.452+1.626GPA+0.052(21.938)]PSI=1:Prob(GRADE=1)=[−7.452+1.626GPA+0.052(21.938)+1.426]Figure21.2showsthesetwofunctionsplottedovertherangeofGRADEobservedinthesample,2.0to4.0.ThemarginaleffectofPSIisthedifferencebetweenthetwofunctions,whichrangesfromonlyabout0.06atGPA=2toabout0.50atGPAof3.5.Thiseffectshowsthattheprobabilitythatastudent’sgradewillincreaseafterexposuretoPSIisfargreaterforstudentswithhighGPAsthanforthosewithlowGPAs.AtthesamplemeanofGPAof3.117,theeffectofPSIontheprobabilityis0.465.Thesimplederivativecalculationof(21-9)isgiveninTable21.1;theestimateis0.468.But,ofcourse,thiscalculationdoesnotshowthewiderangeofdifferencesdisplayedinFigure21.2.Table21.2presentstheestimatedcoefficientsandmarginaleffectsfortheprobitandlogitmodelsinTable21.1.Inbothcases,theasymptoticcovariancematrixiscomputedfromthenegativeinverseoftheactualHessianofthelog-likelihood.ThestandarderrorsfortheestimatedmarginaleffectofPSIarecomputedusing(21-25)and(21-26)sincePSIisabinaryvariable.Incomparison,thesimplederivativesproduceestimatesandstandarderrorsof(0.449,0.181)forthelogitmodeland(0.464,0.188)fortheprobitmodel.Thesedifferonlyslightlyfromtheresultsgiveninthetable.21.4.3HYPOTHESISTESTSFortestinghypothesesaboutthecoefficients,thefullmenuofproceduresisavailable.Thesimplestmethodforasinglerestrictionwouldbebasedontheusualttests,usingthestandarderrorsfromtheinformationmatrix.Usingthenormaldistributionoftheestimator,wewouldusethestandardnormaltableratherthanthettableforcriticalpoints.Formoreinvolvedrestrictions,itispossibletousetheWaldtest.Forasetof12Onemightbetemptedinthiscasetosuggestanasymmetricdistributionforthemodel,suchastheWeibulldistribution.However,theasymmetryinthemodel,totheextentthatitispresentatall,referstothevaluesofε,nottotheobservedsampleofvaluesofthedependentvariable.\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice6771.00.8WithPSI1)0.60.5710.4Prob(GradeWithoutPSI0.20.10602.02.53.03.54.03.117GPAFIGURE21.2EffectofPSIonPredictedProbabilities.TABLE21.2EstimatedCoefficientsandStandardErrors(StandardErrorsinParentheses)LogisticProbitVariableCoefficienttRatioSlopetRatioCoefficienttRatioSlopetRatioConstant−13.021−2.641——−7.452−2.931——(4.931)(2.542)GPA2.8262.2380.5342.2521.6262.3430.5332.294(1.263)(0.237)(0.694)(0.232)TUCE0.0950.6720.0180.6850.0520.6170.0170.626(0.142)(0.026)(0.084)(0.027)PSI2.3792.2340.4562.5211.4262.3970.4642.727(1.065)(0.181)(0.595)(0.170)loglikelihood−12.890−12.819restrictionsRβ=q,thestatisticisW=(Rβˆ−q){R(Est.Asy.Var[βˆ])R}−1(Rβˆ−q).Forexample,fortestingthehypothesisthatasubsetofthecoefficients,saythelastM,arezero,theWaldstatisticusesR=[0|IM]andq=0.Collectingterms,wefindthattheteststatisticforthishypothesisisW=βˆV−1βˆ,(21-27)MMMwherethesubscriptMindicatesthesubvectororsubmatrixcorrespondingtotheMvariablesandVistheestimatedasymptoticcovariancematrixofβˆ.\nGreene-50240bookJune27,200222:39678CHAPTER21✦ModelsforDiscreteChoiceLikelihoodratioandLagrangemultiplierstatisticscanalsobecomputed.Thelike-lihoodratiostatisticisLR=−2[lnLˆR−lnLˆU],whereLˆRandLˆUarethelog-likelihoodfunctionsevaluatedattherestrictedandunre-strictedestimates,respectively.Acommontest,whichissimilartotheFtestthatalltheslopesinaregressionarezero,isthelikelihoodratiotestthatalltheslopecoefficientsintheprobitorlogitmodelarezero.Forthistest,theconstanttermremainsunrestricted.Inthiscase,therestrictedlog-likelihoodisthesameforbothprobitandlogitmodels,lnL0=n[PlnP+(1−P)ln(1−P)],(21-28)wherePistheproportionoftheobservationsthathavedependentvariableequalto1.Itmightbetemptingtousethelikelihoodratiotesttochoosebetweentheprobitandlogitmodels.Butthereisnorestrictioninvolved,andthetestisnotvalidforthispurpose.Tounderscorethepoint,thereisnothinginitsconstructiontopreventthechi-squaredstatisticforthis“test”frombeingnegative.TheLagrangemultiplierteststatisticisLM=gVg,wheregisthefirstderivativesoftheunrestrictedmodelevaluatedattherestrictedparametervectorandVisanyofthethreeestimatorsoftheasymptoticcovariancematrixofthemaximumlikelihoodes-timator,onceagaincomputedusingtherestrictedestimates.DavidsonandMacKinnon(1984)findevidencethatE[H]isthebestofthethreeestimatorstouse,whichgivesnn−1nLM=gxE[−h]xxgx,(21-29)iiiiiiii=1i=1i=1whereE[−hi]isdefinedin(21-22)forthelogitmodelandin(21-24)fortheprobitmodel.Forthelogitmodel,whenthehypothesisisthatalltheslopesarezero,LM=nR2,whereR2istheuncenteredcoefficientofdeterminationintheregressionof(y−y¯)onixiandy¯istheproportionof1sinthesample.AnalternativeformulationbasedontheBHHHestimator,whichwedevelopedinSection17.5.3isalsoconvenient.Foranyofthemodels(probit,logit,Weibull,etc.),thefirstderivativevectorcanbewrittenas∂lnLn=gx=XGi,ii∂βi=1whereG(n×n)=diag[g1,g2,...,gn]andiisann×1columnof1s.TheBHHHesti-matoroftheHessianis(XGGX),sotheLMstatisticbasedonthisestimatoris1LM=ni(GX)(XGGX)−1(XG)i=nR2,(21-30)inwhereR2istheuncenteredcoefficientofdeterminationinaregressionofacolumnofionesonthefirstderivativesofthelogsoftheindividualprobabilities.Allthestatisticslistedhereareasymptoticallyequivalentandunderthenullhypoth-esisoftherestrictedmodelhavelimitingchi-squareddistributionswithdegreesoffree-domequaltothenumberofrestrictionsbeingtested.Weconsidersomeexamplesbelow.\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice67921.4.4SPECIFICATIONTESTSFORBINARYCHOICEMODELSInthelinearregressionmodel,weconsideredtwoimportantspecificationproblems,theeffectofomittedvariablesandtheeffectofheteroscedasticity.Intheclassicalmodel,y=X1β1+X2β2+ε,whenleastsquaresestimatesb1arecomputedomittingX2,E[b]=β+[XX]−1XXβ.1111122UnlessX1andX2areorthogonalorβ2=0,b1isbiased.Ifweignoreheteroscedasticity,thenalthoughtheleastsquaresestimatorisstillunbiasedandconsistent,itisinefficientandtheusualestimateofitssamplingcovariancematrixisinappropriate.YatchewandGriliches(1984)haveexaminedthesesameissuesinthesettingoftheprobitandlogitmodels.Theirgeneralresultsarefarmorepessimistic.Inthecontextofabinarychoicemodel,theyfindthefollowing:1.Ifx2isomittedfromamodelcontainingx1andx2,(i.e.β2=0)thenplimβˆ1=c1β1+c2β2,wherec1andc2arecomplicatedfunctionsoftheunknownparameters.Theimplicationisthateveniftheomittedvariableisuncorrelatedwiththeincludedone,thecoefficientontheincludedvariablewillbeinconsistent.2.Ifthedisturbancesintheunderlyingregressionareheteroscedastic,thenthemaximumlikelihoodestimatorsareinconsistentandthecovariancematrixisinappropriate.Thesecondresultisparticularlytroublingbecausetheprobitmodelismostoftenusedwithmicroeconomicdata,whicharefrequentlyheteroscedastic.Anyofthethreemethodsofhypothesistestingdiscussedabovecanbeusedtoanalyzethesespecificationproblems.TheLagrangemultipliertesthastheadvantagethatitcanbecarriedoutusingtheestimatesfromtherestrictedmodel,whichsometimesbringsalargesavingincomputationaleffort.Thissituationisespeciallytrueforthetestforheteroscedasticity.13Toreiterate,theLagrangemultiplierstatisticiscomputedasfollows.Letthenullhypothesis,H0,beaspecificationofthemodel,andletH1bethealternative.Forexample,H0mightspecifythatonlyvariablesx1appearinthemodel,whereasH1mightspecifythatx2appearsinthemodelaswell.ThestatisticisLM=gV−1g,000whereg0isthevectorofderivativesofthelog-likelihoodasspecifiedbyH1butevaluatedatthemaximumlikelihoodestimatoroftheparametersassumingthatH0istrue,and−1V0isanyofthethreeconsistentestimatorsoftheasymptoticvariancematrixofthemaximumlikelihoodestimatorunderH1,alsocomputedusingthemaximumlikelihoodestimatorsbasedonH0.Thestatisticisasymptoticallydistributedaschi-squaredwithdegreesoffreedomequaltothenumberofrestrictions.13TheresultsinthissectionarebasedonDavidsonandMacKinnon(1984)andEngle(1984).AsymposiumonthesubjectofspecificationtestsindiscretechoicemodelsisBlundell(1987).\nGreene-50240bookJune27,200222:39680CHAPTER21✦ModelsforDiscreteChoice21.4.4.aOmittedVariablesThehypothesistobetestedisH:y∗=βx+ε,011(21-31)H:y∗=βx+βx+ε,11122sothetestisofthenullhypothesisthatβ2=0.TheLagrangemultipliertestwouldbecarriedoutasfollows:1.EstimatethemodelinH0bymaximumlikelihood.Therestrictedcoefficientvectoris[βˆ1,0].2.Letxbethecompoundvector,[x1,x2].Thestatisticisthencomputedaccordingto(21-29)or(21-30).Itisnoteworthythatinthiscaseasinmanyothers,theLagrangemultiplieristhecoefficientofdeterminationinaregression.21.4.4.bHeteroscedasticityWeusethegeneralformulationanalyzedbyHarvey(1976),14Var[ε]=[exp(zγ)]2.15Thismodelcanbeappliedequallytotheprobitandlogitmodels.Wewillderivetheresultsspecificallyfortheprobitmodel;thelogitmodelisessentiallythesame.Thus,y∗=xβ+ε,(21-32)Var[ε|x,z]=[exp(zγ)]2.Thepresenceofheteroscedasticitymakessomecarenecessaryininterpretingthecoefficientsforavariablewkthatcouldbeinxorzorboth,∂Prob(Y=1|x,z)xββ−(xβ)γkk=φ.∂wkexp(zγ)exp(zγ)Onlythefirst(second)termappliesifwkappearsonlyinx(z).Thisimpliesthatthesimplecoefficientmaydifferradicallyfromtheeffectthatisofinterestintheestimatedmodel.Thiseffectisclearlyvisibleintheexamplebelow.Thelog-likelihoodisnxiβxiβlnL=yilnF+(1−yi)ln1−F.(21-33)exp(ziγ)exp(ziγ)i=114SeeKnappandSeaks(1992)foranapplication.OtherformulationsaresuggestedbyFisherandNagin(1981),HausmanandWise(1978),andHorowitz(1993).15SeeSection11.7.1.\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice681Tobeabletoestimatealltheparameters,zcannothaveaconstantterm.Thederivativesaren∂lnLfi(yi−Fi)=exp(−ziγ)xi,∂βFi(1−Fi)i=1n(21-34)∂lnLfi(yi−Fi)=exp(−ziγ)zi(−xiβ),∂γFi(1−Fi)i=1whichimpliesadifficultlog-likelihoodtomaximize.Butifthemodelisestimatedassumingthatγ=0,thenwecaneasilytestforhomoscedasticity.Letxiwi=(21-35)(−xβˆ)ziicomputedatthemaximumlikelihoodestimator,assumingthatγ=0.Then(21-29)or(21-30)canbeusedasusualfortheLagrangemultiplierstatistic.DavidsonandMacKinnoncarriedoutaMonteCarlostudytoexaminethetruesizesandpowerfunctionsofthesetests.Asmightbeexpected,thetestforomittedvariablesisrelativelypowerful.Thetestforheteroscedasticitymaywellpickupsomeotherformofmisspecification,however,includingperhapsthesimpleomissionofzfromtheindexfunction,soitspowermaybeproblematic.Itisperhapsnotsurprisingthatthesameproblemaroseearlierinourtestforheteroscedasticityinthelinearregressionmodel.Example21.4SpecificationTestsinaLaborForceParticipationModelUsingthedatadescribedinExample21.1,wefitaprobitmodelforlaborforceparticipationbasedonthespecification2Prob[LFP=1]=F(constant,age,age,familyincome,education,kids)Forthesedata,P=428/753=0.568393.Therestricted(allslopesequalzero,freeconstantterm)log-likelihoodis325×ln(325/753)+428×ln(428/753)=−514.8732.Theunrestrictedlog-likelihoodfortheprobitmodelis−490.84784.Thechi-squaredstatisticis,therefore,48.05072.Thecriticalvaluefromthechi-squareddistributionwith5degreesoffreedomis11.07,sothejointhypothesisthatthecoefficientsonage,age2,familyincomeandkidsareallzeroisrejected.Considerthealternativehypothesis,thattheconstanttermandthecoefficientsonage,age2,familyincomeandeducationarethesamewhetherkidsequalsoneorzero,againstthealternativethatanaltogetherdifferentequationappliesforthetwogroupsofwomen,thosewithkids=1andthosewithkids=0.Totestthishypothesis,wewoulduseacounterparttotheChowtestofSection7.4andExample7.6.Therestrictedmodelinthisinstancewouldbebasedonthepooleddatasetofall753observations.Thelog-likelihoodforthepooledmodel—whichhasaconstantterm,age,age2,familyincomeandeducationis−496.8663.Thelog-likelihoodsforthismodelbasedonthe428observationswithkids=1andthe325observationswithkids=0are−347.87441and−141.60501,respectively.Thelog-likelihoodfortheunrestrictedmodelwithseparatecoefficientvectorsisthusthesum,−489.47942.Thechi-squaredstatisticfortestingthefiverestrictionsofthepooledmodelistwicethedifference,LR=2[−489.47942−(−496.8663)]=14.7738.The95percentcriticalvaluefromthechi-squareddistributionwith5degreesoffreedomis11.07issoatthissignificancelevel,thehypothesisthattheconstanttermsandthecoefficientsonage,age2,familyincomeandeducationarethesameisrejected.(The99%criticalvalueis15.09.)\nGreene-50240bookJune27,200222:39682CHAPTER21✦ModelsforDiscreteChoiceTABLE21.3EstimatedCoefficientsEstimate(Std.Er)Marg.Effect*Estimate(St.Er.)Marg.Effect*Constantβ1−4.157(1.402)—−6.030(2.498)—Ageβ20.185(0.0660)−0.0079(0.0027)0.264(0.118)−0.0088(0.00251)Age2β−0.0024(0.00077)—−0.0036(0.0014)—3Incomeβ40.0458(0.0421)0.0180(0.0165)0.424(0.222)0.0552(0.0240)Educationβ50.0982(0.0230)0.0385(0.0090)0.140(0.0519)0.0289(0.00869)Kidsβ6−0.449(0.131)−0.171(0.0480)−0.879(0.303)−0.167(0.0779)Kidsγ10.000—−0.141(0.324)—Incomeγ20.000—0.313(0.123)—LogL−490.8478−487.6356CorrectPreds.0s:106,1s:3570s:115,1s:358*Marginaleffectandestimatedstandarderrorincludebothmean(β)andvariance(γ)effects.Table21.3presentsestimatesoftheprobitmodelnowwithacorrectionforheteroscedas-ticityoftheformVar[εi]=exp(γ1kids+γ2familyincome).ThethreetestsforhomoscedasticitygiveLR=2[−487.6356−(−490.8478)]=6.424,LM=2.236basedontheBHHHestimator,Wald=6.533(2restrictions).The99percentcriticalvaluefortworestrictionsis5.99,sotheLMstatisticconflictswiththeothertwo.21.4.4.cASpecificationTestforNonnestedModels—TestingfortheDistributionWhetherthelogitorprobitform,orsomethirdalternative,isthebestspecificationforadiscretechoicemodelisaperennialquestion.Sincethedistributionsarenotnestedwithinsomehigherlevelmodel,testingforananswerisalwaysproblematic.BuildingonthelogicofthePEtestdiscussedinSection9.4.3,Silva(2001)hassuggestedascoretestwhichmaybeusefulinthisregard.Thestatisticisintendedforavarietyofdiscretechoicemodels,butisespeciallyconvenientforbinarychoicemodelswhicharebasedonacommonsingleindexformulation—theprobabilitymodelisProb(y=1|x)=F(xβ).iiiLet“1”denoteModel1basedonparametervectorβand“2”denoteModel2withparametervectorγandletModel1bethenullspecificationwhileModel2isthealternative.A“super-model”whichcombinestwoalternativeswouldhavelikelihoodfunction[(1−α)L(y|X,β)ρ+αL(y|X,γ)ρ]1/ρ12Lρ=[(1−α)L1(z|X,β)ρ+αL2(z|X,γ)ρ]1/ρdzz(Notethatintegrationisusedgenericallyhere,sinceyisdiscrete.)Thetwomixingparametersareρandα.SilvaderivesanLMtestinthiscontextforthehypothesisα=0foranyparticularvalueofρ.Thecasewhenρ=0isofparticularinterest.Ashenotes,itisthenonlinearcounterparttotheCoxtestweexaminedinSection8.3.4.[Forrelatedresults,seePesaranandPesaran(1993),DavidsonandMacKinnon(1984,1993),\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice683Orme(1994),andWeeks(1996).]Forbinarychoicemodels,Silvasuggeststhefollowingprocedure(asoneofthreecomputationalstrategies):Computetheparametersofthecompetingmodelsbymaximumlikelihoodandobtainpredictedprobabilitiesforyi=1,Pˆmwhere“i”denotestheobservationand“m”=1or2forthetwomodels.15Theiindividualobservationsonthedensityforthenullmodel,fˆm,arealsorequired.TheinewvariablePˆ11−Pˆ1Pˆ11−Pˆ2iiiizi(0)=lnfˆ1Pˆ21−Pˆ1iiiisthencomputed.Finally,Model1isthenreestimatedwithzi(0)addedasanadditionalindependentvariable.Atestofthehypothesisthatitscoefficientiszeroisequivalenttoatestofthenullhypothesisthatα=1,whichfavorsModel1.RejectionofthehypothesisfavorsModel2.Silva’spreferredprocedureisthesameasthisbasedonPˆ2−Pˆ1iizi(1)=.fˆ1iAssuggestedbythecitationsabove,testsofthissorthavealonghistoryinthisliterature.Silva’ssimulationstudyfortheCoxtest(ρ=0)andhisscoretest(ρ=1)suggestthatthepowerofthetestisquiteerratic.21.4.5MEASURINGGOODNESSOFFITTherehavebeenmanyfitmeasuressuggestedforQRmodels.16Ataminimum,oneshouldreportthemaximizedvalueofthelog-likelihoodfunction,lnL.Sincethehypothesisthatalltheslopesinthemodelarezeroisofteninteresting,thelog-likelihoodcomputedwithonlyaconstantterm,lnL0[see(21-28)],shouldalsobereported.AnanalogtotheR2inaconventionalregressionisMcFadden’s(1974)likelihoodratioindex,lnLLRI=1−.lnL0Thismeasurehasanintuitiveappealinthatitisboundedbyzeroandone.Ifalltheslopecoefficientsarezero,thenitequalszero.ThereisnowaytomakeLRIequal1,althoughonecancomeclose.IfFiisalwaysonewhenyequalsoneandzerowhenyequalszero,thenlnLequalszero(thelogofone)andLRIequalsone.Ithasbeensuggestedthatthisfindingisindicativeofa“perfectfit”andthatLRIincreasesasthefitofthemodelimproves.Toadegree,thispointistrue(seetheanalysisinSection21.6.6).Unfortunately,thevaluesbetweenzeroandonehavenonaturalinterpretation.IfF(xβ)isaproperpdf,thenevenwithmanyregressorsthemodelcannotfitperfectlyiunlessxβgoesto+∞or−∞.Asapracticalmatter,itdoeshappen.Butwhenitdoes,iitindicatesaflawinthemodel,notagoodfit.Iftherangeofoneoftheindependentvariablescontainsavalue,sayx∗,suchthatthesignof(x−x∗)predictsyperfectly15Hisconjectureaboutthecomputationalburdenisprobablyoverstatedgiventhatmodernsoftwareoffersavarietyofbinarychoicemodelsessentiallyinpush-buttonfashion.16See,forexample,CraggandUhler(1970),Amemiya(1981),Maddala(1983),McFadden(1974),Ben-AkivaandLerman(1985),KayandLittle(1986),VeallandZimmermann(1992),ZavoinaandMcKelvey(1975),Efron(1978),andCramer(1999).AsurveyoftechniquesappearsinWindmeijer(1995).\nGreene-50240bookJune27,200222:39684CHAPTER21✦ModelsforDiscreteChoiceandviceversa,thenthemodelwillbecomeaperfectpredictor.Thisresultalsoholdsingeneralifthesignofxβgivesaperfectpredictorforsomevectorβ.17Forexample,onemightmistakenlyincludeasaregressoradummyvariablesthatisidentical,ornearlyso,tothedependentvariable.Inthiscase,themaximizationprocedurewillbreakdownpreciselybecausexβisdivergingduringtheiterations.[SeeMcKenzie(1998)foranapplicationanddiscussion.]Ofcourse,thissituationisnotatallwhatwehadinmindforagoodfit.Otherfitmeasureshavebeensuggested.Ben-AkivaandLerman(1985)andKayandLittle(1986)suggestedafitmeasurethatiskeyedtothepredictionrule,1nR2=yFˆ+(1−y)(1−Fˆ),BLiiiini=1whichistheaverageprobabilityofcorrectpredictionbythepredictionrule.Thediffi-cultyinthiscomputationisthatinunbalancedsamples,thelessfrequentoutcomewillusuallybepredictedvarybadlybythestandardprocedure,andthismeasuredoesnotpickthatpointup.Cramer(1999)hassuggestedanalternativemeasurethatdirectlymeasuresthisfailure,λ=(averageFˆ|yi=1)−(averageFˆ|yi=0)=(average(1−Fˆ)|yi=0)−(average(1−Fˆ)|yi=1).Cramer’smeasureheavilypenalizestheincorrectpredictions,andbecauseeachpropor-tionistakenwithinthesubsample,itisnotundulyinfluencedbythelargeproportionatesizeofthegroupofmorefrequentoutcomes.SomeoftheotherproposedfitmeasuresareEfron’s(1978)n22i=1(yi−pˆi)REf=1−n2,i=1(yi−y¯)VeallandZimmermann’s(1992)δ−1nR2=LRI,δ=,VZδ−LRI2logL0andZavoinaandMcKelvey’s(1975)n22i=1(βˆxi−βˆx)RMZ=.n+n(βˆx−βˆx)2i=1iThelastofthesemeasurescorrespondstotheregressionvariationdividedbythetotalvariationinthelatentindexfunctionmodel,wherethedisturbancevarianceisσ2=1.ThevaluesofseveralofthesestatisticsaregivenwiththemodelresultsinExample21.4forillustration.Ausefulsummaryofthepredictiveabilityofthemodelisa2×2tableofthehitsandmissesofapredictionrulesuchasyˆ=1ifFˆ>F∗and0otherwise.(21-36)17SeeMcFadden(1984)andAmemiya(1985).Ifthisconditionholds,thengradientmethodswillfindthatβ.\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice685Theusualthresholdvalueis0.5,onthebasisthatweshouldpredictaoneifthemodelsaysaoneismorelikelythanazero.Itisimportantnottoplacetoomuchemphasisonthismeasureofgoodnessoffit,however.Consider,forexample,thenaivepredictoryˆ=1ifP>0.5and0otherwise,(21-37)wherePisthesimpleproportionofonesinthesample.Thisrulewillalwayspredictcorrectly100Ppercentoftheobservations,whichmeansthatthenaivemodeldoesnothavezerofit.Infact,iftheproportionofonesinthesampleisveryhigh,itispossibletoconstructexamplesinwhichthesecondmodelwillgeneratemorecorrectpredictionsthanthefirst!Onceagain,thisflawisnotinthemodel;itisaflawinthefitmeasure.18Theimportantelementtobearinmindisthatthecoefficientsoftheestimatedmodelarenotchosensoastomaximizethis(oranyother)fitmeasure,astheyareinthelinearregressionmodelwherebmaximizesR2.(Themaximumscoreestimatordiscussedbelowaddressesthisissuedirectly.)Anotherconsiderationisthat0.5,althoughtheusualchoice,maynotbeaverygoodvaluetouseforthethreshold.Ifthesampleisunbalanced—thatis,hasmanymoreonesthanzeros,orviceversa—thenbythispredictionruleitmightneverpredictaone(orzero).Toconsideranexample,supposethatinasampleof10,000observations,only1000haveY=1.Weknowthattheaveragepredictedprobabilityinthesamplewillbe0.10.Assuch,itmayrequireanextremeconfigurationofregressorseventoproduceanFof0.2,tosaynothingof0.5.Insuchasetting,thepredictionrulemayfaileverytimetopredictwhenY=1.TheobviousadjustmentistoreduceF∗.Ofcourse,thisadjustmentcomesatacost.IfwereducethethresholdF∗soastopredicty=1moreoften,thenwewillincreasethenumberofcorrectclassificationsofobservationsthatdohavey=1,butwewillalsoincreasethenumberoftimesthatweincorrectlyclassifyasonesobservationsthathavey=0.19Ingeneral,anypredictionruleoftheformin(21-36)willmaketwotypesoferrors:Itwillincorrectlyclassifyzerosasonesandonesaszeros.Inpractice,theseerrorsneednotbesymmetricinthecoststhatresult.Forexample,inacreditscoringmodel[seeBoyes,Hoffman,andLow(1989)],incorrectlyclassifyinganapplicantasabadriskisnotthesameasincorrectlyclassifyingabadriskasagoodone.ChangingF∗willalwaysreducetheprobabilityofonetypeoferrorwhileincreasingtheprobabilityoftheother.Thereisnocorrectanswerastothebestvaluetochoose.Itdependsonthesettingandonthecriterionfunctionuponwhichthepredictionruledepends.ThelikelihoodratioindexandVeallandZimmermann’smodificationofitareobvi-ouslyrelatedtothelikelihoodratiostatisticfortestingthehypothesisthatthecoefficientvectoriszero.Efron’sandCramer’smeasureslistedaboveareorientedmoretowardtherelationshipbetweenthefittedprobabilitiesandtheactualvalues.Efron’sandCramer’sstatisticsareusefullytiedtothestandardpredictionruleyˆ=1[Fˆ>0.5].TheMcKelveyandZavoinameasureisananalogtotheregressioncoefficientofdetermination,basedontheunderlyingregressiony∗=βx+ε.Whetherthesehaveacloserelationshiptoanytypeoffitinthefamiliarsenseisaquestionthatneedstobestudied.Insomecases,18SeeAmemiya(1981).19Thetechniqueofdiscriminantanalysisisusedtobuildaprocedurearoundthisconsideration.Inthissetting,weconsidernotonlythenumberofcorrectandincorrectclassifications,butthecostofeachtypeofmisclassification.\nGreene-50240bookJune27,200222:39686CHAPTER21✦ModelsforDiscreteChoiceitappearsso.Butthemaximumlikelihoodestimator,onwhichallthefitmeasuresarebased,isnotchosensoastomaximizeafittingcriterionbasedonpredictionofyasitisintheclassicalregression(whichmaximizesR2).Itischosentomaximizethejointdensityoftheobserveddependentvariables.Itremainsaninterestingquestionforresearchwhetherfittingywellorobtaininggoodparameterestimatesisapreferableestimationcriterion.Evidently,theyneednotbethesamething.Example21.5PredictionwithaProbitModelTunali(1986)estimatedaprobitmodelinastudyofmigration,subsequentremigration,andearningsforalargesampleofobservationsofmalemembersofhouseholdsinTurkey.Amonghisresults,hereportsthesummaryshownbelowforaprobitmodel:Theestimatedmodelishighlysignificant,withalikelihoodratiotestofthehypothesisthatthecoefficients(16ofthem)arezerobasedonachi-squaredvalueof69with16degreesoffreedom.20Themodelpredicts491of690,or71.2percent,oftheobservationscorrectly,althoughthelikelihoodratioindexisonly0.083.Anaivemodel,whichalwayspredictsthaty=0becauseP<0.5,predicts487of690,or70.6percent,oftheobservationscorrectly.Thisresultishardlysuggestiveofnofit.Themaximumlikelihoodestimatorproducesseveralsignificantinfluencesontheprobabilitybutmakesonlyfourmorecorrectpredictionsthanthenaivepredictor.21PredictedD=0D=1TotalActualD=047116487D=118320203Total6543669021.4.6ANALYSISOFPROPORTIONSDATADatafortheanalysisofbinaryresponseswillbeinoneoftwoforms.Thedatawehaveconsideredthusfarareindividual;eachobservationconsistsof[yi,xi],theactualresponseofanindividualandassociatedregressorvector.Groupeddatausuallyconsistofcountsorproportions.Groupeddataareobtainedbyobservingtheresponseofniindividuals,allofwhomhavethesamexi.TheobserveddependentvariablewillconsistoftheproportionPioftheniindividualsijwhorespondwithyij=1.Anobservationisthus[n,P,x],i=1,...,N.Electiondataaretypical.22Inthegroupeddatasetting,iiiitispossibletouseregressionmethodsaswellasmaximumlikelihoodprocedurestoanalyzetherelationshipbetweenPiandxi.TheobservedPiisanestimateofthepopulationquantity,π=F(xβ).IfwetreatthisproblemasasimpleoneofsamplingiifromaBernoullipopulation,then,frombasicstatistics,wehavePi=F(βxi)+εi=πi+εi,20Thisviewactuallyunderstatesslightlythesignificanceofhismodel,becausetheprecedingpredictionsarebasedonabivariatemodel.Thelikelihoodratiotestfailstorejectthehypothesisthataunivariatemodelapplies,however.21Itisalsonoteworthythatnearlyallthecorrectpredictionsofthemaximumlikelihoodestimatorarethezeros.Ithitsonly10percentoftheonesinthesample.22Theearliestworkonprobitmodelinginvolvedapplicationsofgroupeddatainlaboratoryexperiments.Eachobservationconsistedofnisubjectsreceivingdosagexiofsometreatment,suchasaninsecticide,andaproportionPi“responding”tothetreatment,usuallybydying.Finney(1971)andCox(1970)areusefulandearlysurveysofthisliterature.\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice687whereπi(1−πi)E[εi]=0,Var[εi]=.(21-38)niThisheteroscedasticregressionformatsuggeststhattheparameterscouldbeestimatedbyanonlinearweightedleastsquaresregression.Butthereisasimplerwaytoproceed.SincethefunctionF(xβ)isstrictlymonotonic,ithasaninverse.(SeeFigure21.1.)iConsider,then,aTaylorseriesapproximationtothisfunctionaroundthepointεi=0,thatis,aroundthepointPi=πi,dF−1(π)F−1(P)=F−1(π+ε)≈F−1(π)+i(P−π).iiiiiidπiButF−1(π)=xβandiidF−1(π)11i==,dπiF(F−1(πi))f(πi)so−1εiF(Pi)≈xiβ+.f(πi)Thisequationproducesaheteroscedasticlinearregression,F−1(P)=z=xβ+u,iiiiwhereF(πi)[(1−F(πi)]E[ui|xi]=0andVar[ui|xi]=.(21-39)ni[f(πi)]2Theinversefunctionforthelogisticmodelisparticularlyeasytoobtain.Ifexp(xβ)iπi=,1+exp(xiβ)thenπiln=xiβ.1−πiThisfunctioniscalledthelogitofπi,hencethename“logit”model.Forthenormaldistribution,theinversefunction−1(π),calledthenormitofπ,mustbeapproximated.iiTheusualapproachisaratioofpolynomials.23Weightedleastsquaresregressionbasedon(21-39)producestheminimumchi-squaredestimator(MCSE)ofβ.Sincetheweightsarefunctionsoftheunknownpa-rameters,atwo-stepprocedureiscalledfor.Asalways,simpleleastsquaresatthefirststepproducesconsistentbutinefficientestimates.Thentheestimatedvariancesˆi(1−ˆi)wi=nφˆ2ii23SeeAbramovitzandStegun(1971)andSectionE.5.2.Thefunctionnormit+5iscalledtheprobitofPi.Thetermdatesfromtheearlydaysofthisanalysis,whentheavoidanceofnegativenumberswasasimplificationwithconsiderablepayoff.\nGreene-50240bookJune27,200222:39688CHAPTER21✦ModelsforDiscreteChoicefortheprobitmodelor1wi=niˆi(1−ˆi)forthelogitmodelbasedonthefirst-stepestimatescanbeusedforweightedleastsquares.24Aniterationcanthenbesetup,−1n1n1βˆ(k+1)=xxxF−1πˆ(k)(k)ii(k)iii=1wˆii=1wˆiwhere“(k)”indicatesthekthiterationand“∧”indicatescomputationofthequantityatthecurrent(kth)estimateofβ.TheMCSEhasthesameasymptoticpropertiesasthemaximumlikelihoodestimatorateverystepafterthefirst,so,infact,iterationisnotnecessary.Althoughtheyhavethesameprobabilitylimit,theMCSEisnotalgebraicallythesameastheMLE,andinafinitesample,theywilldiffernumerically.Thelog-likelihoodfunctionforabinarychoicemodelwithgroupeddataisnlnL=nPlnF(xβ)+(1−P)ln[1−F(xβ)].iiiiii=1Thelikelihoodequationthatdefinesthemaximumlikelihoodestimatorisn∂lnLf(xiβ)f(xiβ)=niPi−(1−Pi)xi=0.∂βF(xiβ)1−F(xiβ)i=1Thisequationcloselyresemblesthesolutionfortheindividualdatacase,whichmakessenseifweviewthegroupedobservationasnireplicationsofanindividualobser-vation.Ontheotherhand,itisclearoninspectionthatthesolutiontothissetofequationswillnotbethesameasthegeneralized(weighted)leastsquaresestimatorsuggestedinthepreviousparagraph.Forconvenience,defineF=F(xβ),f=f(xβ),iiiiandf=[f(z)|z=xβ]=[df(z)/dz]|z=xβ.TheHessianofthelog-likelihoodisiiin22∂2lnLffffiiii=niPi−−(1−Pi)+xixi.∂β∂βFiFi1−Fi(1−Fi)i=1ToevaluatetheexpectationoftheHessian,weneedonlyinserttheexpectationoftheonlystochasticelement,Pi,whichisE[Pi|xi]=Fi.Then2n22n2∂logLfifinifiE=nifi−−fi−xixi=−xixi.∂β∂βFi1−FiFi(1−Fi)i=1i=1Theasymptoticcovariancematrixforthemaximumlikelihoodestimatoristhenegativeinverseofthismatrix.From(21-39),weseethatitisexactlyequaltoAsy.Var[minimumχ2estimator]=[X−1X]−124Simplyusingpiandf[F−1(Pi)]mightseemtobeasimpleexpedientincomputingtheweights.Butthismethodwouldbeanalogoustousingy2insteadofanestimateofσ2inaheteroscedasticregression.Fittediiprobabilitiesand,fortheprobitmodel,densitiesshouldbebasedonaconsistentestimatoroftheparameters.\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice689sincethediagonalelementsof−1arepreciselythevaluesinbracketsintheexpressionfortheexpectedHessianabove.WeconcludethatalthoughtheMCSEandtheMLEforthismodelarenumericallydifferent,theyhavethesameasymptoticproperties,consistentandasymptoticallynormal(theMCSestimatorbyvirtueoftheresultsofChapter10,theMLEbythoseinChapter17),andwithasymptoticcovariancematrixaspreviouslygiven.ThereisacomplicationinusingtheMCSestimator.TheFGLSestimatorbreaksdownifanyofthesampleproportionsequalsoneorzero.Anumberofadhocpatcheshavebeensuggested;theonethatseemstobemostwidelyusedistoaddorsubtractasmallconstant,say0.001,toorfromtheobservedproportionwhenitiszeroorone.Thefamiliarresultsin(21-38)alsosuggestthatwhentheproportionisbasedonalargepopulation,thevarianceoftheestimatorcanbeexceedinglylow.Thisissuewillresurfaceinsurprisinglylowstandarderrorsandhightratiosintheweightedregression.Unfortunately,thatisaconsequenceofthemodel.25Thesameresultwillemergeinmaximumlikelihoodestimationwithgroupeddata.21.5EXTENSIONSOFTHEBINARYCHOICEMODELQualitativeresponsemodelshavebeenagrowthindustryineconometrics.Therecentliterature,particularlyintheareaofpaneldataanalysis,hasproducedanumberofnewtechniques.21.5.1RANDOMANDFIXEDEFFECTSMODELSFORPANELDATATheavailabilityofhighqualitypaneldatasetsonmicroeconomicbehaviorhasmain-tainedaninterestinextendingthemodelsofChapter13tobinary(andotherdiscretechoice)models.Inthissection,wewillsurveyafewresultsfromthisrapidlygrowingliterature.Thestructuralmodelforapossiblyunbalancedpanelofdatawouldbewritteny∗=xβ+ε,i=1,...,n,t=1,...,T,itititiy=1ify∗>0,and0otherwise.ititThesecondlineofthisdefinitionisoftenwritteny=1(xβ+ε>0)ititittoindicateavariablewhichequalsonewhentheconditioninparenthesesistrueandzerowhenitisnot.Ideally,wewouldliketospecifythatεitandεisarefreelycorrelatedwithinagroup,butuncorrelatedacrossgroups.Butdoingsowillinvolvecomputing25Whethertheproportionshould,infact,beconsideredasasingleobservationfromadistributionofpro-portionsisaquestionthatarisesinallthesecases.Itisunambiguousinthebioassaycasesnotedearlier.Buttheissueislessclearwithelectiondata,especiallysinceinthesecases,theniwillrepresentmostofifnotallthepotentialrespondentsinlocationiratherthanarandomsampleofrespondents.\nGreene-50240bookJune27,200222:39690CHAPTER21✦ModelsforDiscreteChoicejointprobabilitiesfromaTvariatedistribution,whichisgenerallyproblematic.26(Weiwillreturntothisissuebelow.)Amorepromisingapproachisaneffectsmodel,y∗=xβ+v+u,i=1,...,n,t=1,...,T,itititiiy=1ify∗>0,and0otherwiseititwhere,asbefore(seeSection13.4),uiistheunobserved,individualspecifichetero-geneity.Onceagain,wedistinguishbetween“random”and“fixed”effectsmodelsbytherelationshipbetweenuiandxit.Theassumptionthatuiisunrelatedtoxit,sothattheconditionaldistributionf(ui|xit)isnotdependentonxit,producestherandomeffectsmodel.Notethatthisplacesarestrictiononthedistributionoftheheterogene-ity.Ifthatdistributionisunrestricted,sothatuiandxitmaybecorrelated,thenwehavewhatiscalledthefixedeffectsmodel.Thedistinctiondoesnotrelatetoanyintrinsiccharacteristicoftheeffect,itself.Asweshallseeshortly,thisisamodelingframeworkthatisfraughtwithdifficul-tiesandunconventionalestimationproblems.Amongthemare:estimationoftheran-domeffectsmodelrequiresverystrongassumptionsabouttheheterogeneity;thefixedeffectsmodelencountersanincidentalparametersproblemthatrendersthemaximumlikelihoodestimatorinconsistent.Webeginwiththerandomeffectsspecification,thenconsiderfixedeffectsandsomesemiparametricapproachesthatdonotrequirethedistinction.Weconcludewithabrieflookatdynamicmodelsofstatedependence.2721.5.1.aRandomEffectsModelsAspecificationwhichhasthesamestructureastherandomeffectsmodelofSection13.4,hasbeenimplementedbyButlerandMoffitt(1982).Wewillsketchthederivationtosuggesthowrandomeffectscanbehandledindiscreteandlimiteddependentvariablemodelssuchasthisone.FulldetailsonestimationandinferencemaybefoundinButlerandMoffitt(1982)andGreene(1995a).WewillthenexaminesomeextensionsoftheButlerandMoffittmodel.Therandomeffectsmodelspecifiesεit=vit+uiwherevitanduiareindependentrandomvariableswithE[vit|X]=0;Cov[vit,vjs|X]=Var[vit|X]=1ifi=jandt=s;0otherwiseE[u|X]=0;Cov[u,u|X]=Var[u|X]=σ2ifi=j;0otherwiseiijiuCov[vit,uj|X]=0foralli,t,j26A“limitedinformation”approachbasedontheGMMestimationmethodhasbeensuggestedbyAvery,Hansen,andHotz(1983).Withrecentadvancesinsimulation-basedcomputationofmultinormalintegrals(seeSectionE.5.6),someworkonsuchapaneldataestimatorhasappearedintheliterature.See,forexample,Geweke,Keane,andRunkle(1994,1997).TheGEEestimatorofDiggle,Liang,andZeger(1994)[seealso,LiangandZeger(1980)andStata(2001)]seemstobeanotherpossibility.However,inallthesecases,itmustberememberedthattheprocedurespecifiesestimationofacorrelationmatrixforaTivectorofunobservedvariablesbasedonadependentvariablewhichtakesonlytwovalues.WeshouldnotbetoooptimisticaboutthisifTiisevenmoderatelylarge.27AsurveyofsomeoftheseresultsisgivenbyHsiao(1996).MostofHsiao(1996)isdevotedtothelinearregressionmodel.Anumberofstudiesspecificallyfocusedondiscretechoicemodelsandpaneldatahaveappearedrecently,includingBeck,Epstein,Jackman,andO’Halloran(2001),Arellano(2001)andGreene(2001).\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice691andXindicatesalltheexogenousdatainthesample,xforalliandt.28Then,itE[εit|X]=0Var[ε|X]=σ2+σ2=1+σ2itvuuandσ2uCorr[εit,εis|X]=ρ=.1+σ2uThenewfreeparameterisσ2=ρ/(1−ρ).uRecallthatinthecross-sectioncase,theprobabilityassociatedwithanobservationisUiP(y|x)=f(ε)dε,(L,U)=(−∞,−xβ)ify=0and(−xβ,+∞)ify=1.iiiiiiiiiiLiThissimplifiesto[(2y−1)xβ]forthenormaldistributionand[(2y−1)xβ]fortheiiiilogitmodel.Inthefullygeneralcasewithanunrestrictedcovariancematrix,thecontri-butionofgroupitothelikelihoodwouldbethejointprobabilityforallTiobservations;UiTiUi1Li=P(yi1,...,yiTi|X)=...f(εi1,εi2,...,εiTi)dεi1dεi2...dεiTi.(21-40)LiTiLi1Theintegrationofthejointdensity,asitstands,isimpracticalinmostcases.Thespecialnatureoftherandomeffectsmodelallowsasimplification,however.Wecanobtainthejointdensityofthevit’sbyintegratinguioutofthejointdensityof(εi1,...,εiTi,ui)whichisf(εi1,...,εiTi,ui)=f(εi1,...,εiTi|ui)f(ui).So,+∞f(εi1,εi2,...,εiTi)=f(εi1,εi2,...,εiTi|ui)f(ui)dui.−∞Theadvantageofthisformisthatconditionedonui,theεi’sareindependent,so+∞Tif(εi1,εi2,...,εiTi)=f(εit|ui)f(ui)dui.−∞t=1Insertingthisresultin(21-40)producesUiTiUi1+∞TiLi=P[yi1,...,yiTi|X]=...f(εit|ui)f(ui)duidεi1dεi2...dεiTi.LiTiLi1−∞t=1Thismaynotlooklikemuchsimplification,butinfact,itis.Sincetherangesofintegra-tionareindependent,wemaychangetheorderofintegration;+∞UUTiiTii1Li=P[yi1,...,yiTi|X]=...f(εit|ui)dεi1dεi2...dεiTif(ui)dui.−∞LiTiLi1t=128SeeWooldridge(1999)fordiscussionofthisassumption.\nGreene-50240bookJune27,200222:39692CHAPTER21✦ModelsforDiscreteChoiceConditionedonthecommonui,theε’sareindependent,sotheterminsquarebracketsisjusttheproductoftheindividualprobabilities.Wecanwritethisas+∞TiUitLi=P[yi1,...,yiTi|X]=f(εit|ui)dεitf(ui)dui.−∞t=1LitNow,considertheindividualdensitiesintheproduct.Conditionedonui,thesearethenowfamiliarprobabilitiesfortheindividualobservations,computednowatxβ+u.itiThisproducesageneralmodelforrandomeffectsforthebinarychoicemodel.Collectingalltheterms,wehavereducedittoT+∞iL=P[y,...,y|X]=Prob(Y=y|xβ+u)f(u)du.ii1iTiitititiii−∞t=1Itremainstospecifythedistributions,buttheimportantresultthusfaristhattheentirecomputationrequiresonlyonedimensionalintegration.Theinnerprobabilitiesmaybeanyofthemodelswehaveconsideredsofar,suchasprobit,logit,Weibull,andsoon.Theintricatepartremainingistodeterminehowtodotheouterintegration.ButlerandMoffitt’smethodassumingthatuiisnormallydistributedisfairlystraightforward,sowewillconsideritfirst.Wewillthenconsidersomeotherpossibilities.Fortheprobitmodel,theindividualprobabilitiesinsidetheproductwouldbe[q(xβ+u)],whereititi[.]isthestandardnormalCDFandqit=2yit−1.Forthelogitmodel,[.]wouldbereplacedwiththelogisticprobability,[.].Forthepresent,treattheentirefunctionasafunctionofui,g(ui).Theintegralis,then∞u21−i2σ2Li=√eug(ui)dui.−∞σu2π√√Letri=ui/(σu2).Then,ui=(σu2)ri=θrianddui=θdri.Makingthechangeofvariableproduces∞1−r2Li=√eig(θri)dri.π−∞(Severalconstantscanceloutofthefractions.)Returningtoourprobit(orlogitmodel),wenowhaveT1+∞i−r2Li=√ei(qit(xitβ+θri)dri.π−∞t=1Thepayofftoallthismanipulationisthatthislikelihoodfunctioninvolvesonlyone-dimensionalintegrals.TheinnerintegralsaretheCDFofthestandardnormaldistri-butionorthelogisticorextremevaluedistributions,whicharesimpletoobtain.ThefunctionisamenabletoGauss–Hermitequadratureforcomputation.(Gauss–HermitequadratureisdiscussedinSectionE.5.4.)Assemblingallthepieces,weobtaintheap-proximationtothelog-likelihood;n1HTilnL=ln√w(q(xβ+θz))Hhitithπi=1h=1t=1\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice693whereHisthenumberofpointsforthequadrature,andwhandzharetheweightsandnodesforthequadrature.Maximizingthisfunctionremainsacomplexproblem.But,itismadequitefeasiblebythetransformationswhichreducetheintegrationtoonedimension.Thistechniquefortheprobitmodelhasbeenincorporatedinmostcontemporaryeconometricsoftwareandcanbeeasilyextendedtoothermodels.Thefirstandsecondderivativesarelikewisecomplexbutstillcomputableby√quadrature.Anestimateofσuisobtainedfromtheresult√σu=θ/2andastandarderrorcanbeobtainedbydividingthatforθˆby2.ThemodelmaybeadaptedtothelogitoranyotherformulationjustbychangingtheCDFintheprecedingequationfrom[.]tothelogisticCDF,[.]ortheotherappropriateCDF.Thehypothesisofnocross-periodcorrelationcanbetested,inprinciple,usinganyofthethreeclassicaltestingprocedureswehavediscussedtoexaminethestatisticalsignificanceoftheestimatedσu.AnumberofauthorshavefoundtheButlerandMoffittformulationtobeasatis-factorycompromisebetweenafullyunrestrictedmodelandthecross-sectionalvariantthatignoresthecorrelationaltogether.ArecentapplicationthatincludesbothgroupandtimeeffectsisTauchen,Witte,andGriesinger’s(1994)studyofarrestsandcriminalbehavior.TheButlerandMoffittapproachhasbeencriticizedfortherestrictionofequalcorrelationacrossperiods.ButitdoeshaveacompellingvirtuethatthemodelcanbeefficientlyestimatedevenwithfairlylargeTiusingconventionalcomputationalmethods.[SeeGreene(1995a,pp.425–431).]AremainingproblemwiththeButlerandMoffittspecificationisitsassumptionofnormality.Ingeneral,otherdistributionsareproblematicbecauseofthedifficultyoffindingeitheraclosedformfortheintegralorasatisfactorymethodofapproximatingtheintegral.Analternativeapproachwhichallowssomeflexibilityisthemethodofmaximumsimulatedlikelihood(MSL)whichwasdiscussedinSection17.8.Thetrans-formedlikelihoodwederivedaboveisanexpectation;T+∞iL=Prob(Y=y|xβ+u)f(u)duiitititiii−∞t=1Ti=EProb(Y=y|xβ+u).uiitititit=1Thisexpectationcanbeapproximatedbysimulationratherthanquadrature.First,letθnowdenotethescaleparameterinthedistributionofui.Thiswouldbeσuforanormaldistribution,forexample,orsomeotherscalingforthelogisticoruniformdistribution.Then,writetheterminthelikelihoodfunctionasTiL=EF(y,xβ+θu)=E[h(u)].iuiititiuiit=1Thefunctionissmooth,continuous,andcontinuouslydifferentiable.Ifthisexpectationisfinite,thentheconditionsofthelawoflargenumbersshouldapply,whichwouldmeanthatforasampleofobservationsui1,...,uiR,1Rplimh(uir)=Eu[h(ui)].Rr=1\nGreene-50240bookJune27,200222:39694CHAPTER21✦ModelsforDiscreteChoiceThissuggests,basedontheresultsinChapter17,analternativemethodofmaximizingthelog-likelihoodfortherandomeffectsmodel.Asampleofpersonspecificdrawsfromthepopulationuicanbegeneratedwitharandomnumbergenerator.FortheButlerandMoffittmodelwithnormallydistributedui,thesimulatedlog-likelihoodfunctionisn1RTilnL=lnF[q(xβ+σu)].SimulateditituirRi=1r=1t=1Thisfunctionismaximizedwithrespectβandσu.Notethatinthepreceding,asinthequadratureapproximatedlog-likelihood,themodelcanbebasedonaprobit,logit,oranyotherfunctionalformdesired.Thereisanadditionaldegreeofflexibilityinthisapproach.TheHermitequadratureapproachisessentiallylimitedbyitsfunctionalformtothenormaldistribution.But,inthesimulationapproach,uircancomefromsomeotherdistribution.Forexample,itmightbebelievedthatthedispersionofthehetero-geneityisgreaterthanimpliedbyanormaldistribution.Thelogisticdistributionmightbepreferable.Arandomsamplefromthelogisticdistributioncanbecreatedbysampling(wi1,...,wiR)fromthestandarduniform[0,1]distribution,thenuir=ln(wir/(1−wir)).Otherdistributions,suchastheuniformitself,arealsopossible.Wehaveexaminedtwoapproachestoestimationofaprobitmodelwithrandomef-fects.GMMestimationisanotherpossibility.Avery,Hansen,andHotz(1983),BertschekandLechner(1998),andInkmann(2000)examinethisapproach;thelattertwooffersomecomparisonwiththequadratureandsimulationbasedestimatorsconsideredhere.(OurapplicationsinthefollowingExamples16.5,17.10,and21.6usetheBertschekandLechnerdata.)Theprecedingopensanotherpossibility.Therandomeffectsmodelcanbecastasamodelwitharandomconstantterm;y∗=α+xβ+ε,i=1,...,n,t=1,...,T,iti(1),it(1)itiy=1ify∗>0,and0otherwiseititwhereαi=α+σuui.Thisissimplyareinterpretationofthemodelwejustanalyzed.Wemight,however,nowextendthisformulationtothefullparametervector.Theresultingstructureisy∗=xβ+ε,i=1,...,n,t=1,...,T,ititiitiy=1ify∗>0,and0otherwiseititwhereβi=β+uiwhereisanonnegativedefinitediagonalmatrix—someofitsdiagonalelementscouldbezerofornonrandomparameters.Themethodofestimationisessentiallythesameasbefore.Thesimulatedloglikelihoodisnown1RTilnL=lnF[q(x(β+u))].SimulatedititirRi=1r=1t=1ThesimulationnowinvolvesRdrawsfromthemultivariatedistributionofu.Sincethedrawsareuncorrelated—isdiagonal—thisisessentiallythesameestimationproblemastherandomeffectsmodelconsideredpreviously.ThismodelisestimatedinExam-ple17.10.Example16.5presentsasimilarmodelthatassumesthatthedistributionofβiisdiscreteratherthancontinuous.\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice69521.5.1.bFixedEffectsModelsThefixedeffectsmodelisy∗=αd+xβ+ε,i=1,...,n,t=1,...,T,itiitititiy=1ify∗>0,and0otherwiseititwhereditisadummyvariablewhichtakesthevalueoneforindividualiandzerootherwise.Forconvenience,wehaveredefinedxittobethenonconstantvariablesinthemodel.TheparameterstobeestimatedaretheKelementsofβandthenindividualconstantterms.Beforeweconsidertheseveralvirtuesandshortcomingsofthismodel,weconsiderthepracticalaspectsofestimationofwhatarepossiblyahugenumberofparameters(n+K)−nisnotlimitedhere,andcouldbeinthethousandsinatypicalapplication.TheloglikelihoodfunctionforthefixedeffectsmodelisnTilnL=lnP(y|α+xβ)itiiti=1t=1whereP(.)istheprobabilityoftheobservedoutcome,forexample,[q(α+xβ)]itiitfortheprobitmodelor[q(α+xβ)]forthelogitmodel.Whatfollowscanbeitiitextendedtoanyindexfunctionmodel,butforthepresent,we’llconfineourattentiontosymmetricdistributionssuchasthenormalandlogistic,sothattheprobabilitycanbeconvenientlywrittenasProb(Y=y|x)=P[q(α+xβ)].Itwillbeconvenientititititiittoletz=α+xβsoProb(Y=y|x)=P(qz).itiititititititInourpreviousapplicationofthismodel,inthelinearregressioncase,wefoundthatestimationoftheparameterswasmadepossiblebyatransformationofthedatatodeviationsfromgroupmeanswhicheliminatedthepersonspecificconstantsfromtheestimator.(SeeSection13.3.2.)Saveforthespecialcasediscussedbelow,thatwillnotbepossiblehere,sothatifonedesirestoestimatetheparametersofthismodel,itwillbenecessaryactuallytocomputethepossiblyhugenumberofconstanttermsatthesametime.Thishasbeenwidelyviewedasapracticalobstacletoestimationofthismodelbecauseoftheneedtoinvertapotentiallylargesecondderivativesmatrix,butthisisamisconception.[See,e.g.,Maddala(1987),p.317.]Thelikelihoodequationsforthismodelare∂lnLTiqf(qz)Tiititit==git=gii=0∂αiP(qitzit)t=1t=1and∂lnLnTiqf(qz)Tiititit=xit=gitxit=0∂βP(qitzit)i=1t=1t=1wheref(.)isthedensitythatcorrespondstoP(.).Forourtwofamiliarmodels,git=qitφ(qitzit)/(qitzit)forthenormalandqit[1−(qitzit)]forthelogistic.Notethatforthesedistributions,gitisalwaysnegativewhenyitiszeroandalwayspositivewhenyitequalsone.(Theuseofqitasintheprecedingassumesthedistributionissymmetric.ForasymmetricdistributionssuchastheWeibull,gitandhitwouldbemorecomplicated,\nGreene-50240bookJune27,200222:39696CHAPTER21✦ModelsforDiscreteChoicebutthecentralresultswouldbethesame.)Thesecondderivativesmatrixis2Ti2Ti∂lnLf(qitzit)f(qitzit)2=−=hit=hii<0,∂αiP(qitzit)P(qitzit)t=1t=1∂2lnLTi=hitxit∂β∂αit=1∂2lnLnTi=hitxitxit=Hββ,anegativesemidefinitematrix.∂β∂βi=1t=1Notethattheleadingqitfallsoutofthesecondderivativessinceineachappear-ance,sinceq2=1.Thederivativesofthedensitieswithrespecttotheirargumentsitare−(qitzit)φ(qitzit)forthenormaldistributionand[1−2(qitzit)]f(qitzit)forthelogistic.Inbothcases,hitisnegativeforallvaluesofqitzit.Thelikelihoodequationsarealargesystem,butthesolutionturnsouttobesurprisinglystraightforward.[SeeGreene(2001).]Byusingtheformulaforthepartitionedinverse,wefindthattheK×KsubmatrixoftheinverseoftheHessianthatcorrespondstoβ,whichwouldprovidetheasymptoticcovariancematrixfortheMLE,is−1nTiTiTiββ1H=hitxitxit−hitxithitxithiii=1t=1t=1t=1−1nTiTihx=h−x¯−x¯wherex¯=t=1itit.it(xiti)(xiti)ihiii=1t=1Notethestrikingsimilaritytotheresultwehadforthefixedeffectsmodelinthelinearcase.ByassemblingtheHessianasapartitionedmatrixforβandthefullvectorofconstantterms,thenusing(A-66b)andthedefinitionsabovetoisolateonediagonalelement,wefindαα1ββHii=+x¯Hx¯iihiiOnceagain,theresulthasthesameformatasitscounterpartinthelinearmodel.Inprin-ciple,thenegativesofthesewouldbetheestimatorsoftheasymptoticvariancesofthemaximumlikelihoodestimators.(Asymptoticpropertiesinthismodelareproblematic,asweconsiderbelow.)Allofthesecanbecomputedquiteeasilyoncetheparameterestimatesareinhand,sothatinfact,practicalestimationofthemodelisnotreallytheobstacle.(Thismustbequalified,however.Lookingatthelikelihoodequationforaconstantterm,itisclearthatifyitisthesameineveryperiodthenthereisnosolution.Forexample,ifyit=1ineveryperiod,then∂lnL/∂αimustbepositive,soitcannotbeequatedtozerowithfinitecoefficients.Suchgroupswouldhavetoberemovedfromthesampleinordertofitthismodel.)ItisshowninGreene(2001)inspiteofthepotentiallylargenumberofparametersinthemodel,Newton’smethodcanbeusedwiththefollowingiteration\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice697whichusesonlytheK×KmatrixcomputedaboveandafewK×1vectors:−1nTinTiβˆ(s+1)=βˆ(s)−h(x−x¯)(x−x¯)g(x−x¯)ititiitiititii=1t=1i=1t=1=βˆ(s)+(s)βandαˆ(s+1)=αˆ(s)−(g/h)+x¯(s).29ııiiiiiβThisisalargeamountofcomputationinvolvingmanysummations,butitislinearinthenumberofparametersanddoesnotinvolveanyn×nmatrices.Theproblemswiththefixedeffectsestimatorarestatistical,notpractical.30TheestimatorreliesonTiincreasingfortheconstanttermstobeconsistent—inessence,eachαiisestimatedwithTiobservations.But,inthissetting,notonlyisTifixed,itislikelytobequitesmall.Assuch,theestimatorsoftheconstanttermsarenotconsistent(notbecausetheyconvergetosomethingotherthanwhattheyaretryingtoestimate,butbecausetheydonotconvergeatall).Theestimatorofβisafunctionoftheestimatorsofα,whichmeansthattheMLEofβisnotconsistenteither.Thisistheincidentalparametersproblem.[SeeNeymanandScott(1948)andLancaster(2000).]Thereis,aswell,asmallsample(smallTi)biasintheestimators.Howseriousthisbiasisremainsaquestionintheliterature.TwopiecesofreceivedwisdomareHsiao’s(1986)resultsforabinarylogitmodelandHeckmanandMaCurdy’s(1980)resultsfortheprobitmodel.HsiaofoundthatforTi=2,thebiasintheMLEofβis100percent,whichisextremelypessimistic.HeckmanandMaCurdyfoundinaMonteCarlostudythatinsamplesofn=100andT=8,thebiasappearedtobeontheorderof10percent,whichissubstantive,butcertainlylessseverethanHsiao’sresultssuggest.Thefixedeffectsapproachdoeshavesomeappealinthatitdoesnotrequireanassumptionoforthogonalityoftheindependentvariablesandtheheterogeneity.Anongoingpursuitintheliteratureisconcernedwiththeseverityofthetradeoffofthisvirtueagainsttheincidentalparametersproblem.SomecommentaryonthisissueappearsinArellano(2001).Whydidtheincidentalparametersproblemarisehereandnotinthelinearregres-sionmodel?Recallthatestimationintheregressionmodelwasbasedonthedeviationsfromgroupmeans,nottheoriginaldataasitishere.Theresultweexploitedtherewasthatalthoughf(yit|Xi)isafunctionofαi,f(yit|Xi,y¯i)isnotafunctionofαi,andweusedthelatterinestimationofβ.Inthatsetting,y¯iisaminimalsufficientstatisticforαi.Sufficientstatisticsareavailableforafewdistributionsthatwewillexamine,butnotfortheprobitmodel.Theyareavailableforthelogitmodel,aswenowexamine.29SimilarresultsappearinPrenticeandGloeckler(1978)whoattributeittoRao(1973),andChamberlain(1983).30SeeVytlacil,AakvikandHeckman(2002),Chamberlain(1980,1984),Newey(1994),BoverandArellano(1997)andChen(1998)forsomeextensionsofparametricformsofthebinarychoicemodelswithfixedeffects.\nGreene-50240bookJune27,200222:39698CHAPTER21✦ModelsforDiscreteChoiceAfixedeffectsbinarylogitmodelisα+xβeiitProb(yit=1|xit)=.1+eαi+xitβTheunconditionallikelihoodforthenTindependentobservationsisL=(F)yit(1−F)1−yit.itititChamberlain(1980)[followingRasch(1960)andAnderson(1970)]observedthattheconditionallikelihoodfunction,nTicL=ProbYi1=yi1,Yi2=yi2,...,YiTi=yiTiyit,i=1t=1isfreeoftheincidentalparameters,αi.ThejointlikelihoodforeachsetofTiobservationsconditionedonthenumberofonesinthesetisTiProbYi1=yi1,Yi2=yi2,...,YiTi=yiTiyit,datat=1expTiyxβt=1itit=.expTidxβtdit=Sit=1ititThefunctioninthedenominatorissummedoverthesetofallTidifferentsequencesSiofTzerosandonesthathavethesamesumasS=Tiy.31iit=1itConsidertheexampleofTi=2.TheunconditionallikelihoodisL=Prob(Yi1=yi1)Prob(Yi2=yi2).iForeachpairofobservations,wehavethesepossibilities:1.yi1=0andyi2=0.Prob(0,0|sum=0)=1.2.yi1=1andyi2=1.Prob(1,1|sum=2)=1.TheithterminLcforeitheroftheseisjustone,sotheycontributenothingtothecon-ditionallikelihoodfunction.32Whenwetakelogs,theseterms(andtheseobservations)willdropout.Butsupposethatyi1=0andyi2=1.ThenProb(0,1andsum=1)Prob(0,1)3.Prob(0,1|sum=1)==.Prob(sum=1)Prob(0,1)+Prob(1,0)31Theenumerationofallthesecomputationsstandstobequiteaburden—seeArellano(2000,p.47)orBaltagi(1995,p.180)who[citingGreene(1993)]suggeststhatTi>10wouldbeexcessive.Infact,usingarecursionsuggestedbyKrailoandPike(1984),thecomputationevenwithTiupto100isroutine.32Recallintheprobitmodelwhenweencounteredthissituation,theindividualconstanttermcouldnotbeestimatedandthegroupwasremovedfromthesample.Thesameeffectisatworkhere.\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice699Therefore,forthispairofobservations,theconditionalprobabilityisα+xβ1eii2α+xβα+xβxβ1+eii11+eii2ei2αi+xβαi+xβ=xβxβ.1ei2ei11ei1+ei2αi+xβαi+xβ+αi+xβαi+xβ1+ei11+ei21+ei11+ei2Byconditioningonthesumofthetwoobservations,wehaveremovedtheheterogeneity.Therefore,wecanconstructtheconditionallikelihoodfunctionastheproductofthesetermsforthepairsofobservationsforwhichthetwoobservationsare(0,1).Pairsofobservationswithoneandzeroareincludedanalogously.Theproductofthetermssuchasthepreceding,forthoseobservationsetsforwhichthesumisnotzeroorTi,constitutestheconditionallikelihood.Maximizationoftheresultingfunctionisstraightforwardandmaybedonebyconventionalmethods.Asinthelinearregressionmodel,itisofsomeinteresttotestwhetherthereisindeedheterogeneity.Withhomogeneity(αi=α),thereisnounusualproblem,andthemodelcanbeestimated,asusual,asalogitmodel.Itisnotpossibletotestthehypothesisusingthelikelihoodratiotest,however,becausethetwolikelihoodsarenotcompara-ble.(Theconditionallikelihoodisbasedonarestricteddataset.)Noneoftheusualtestsofrestrictionscanbeusedbecausetheindividualeffectsareneveractuallyestimated.33Hausman’s(1978)specificationtestisanaturalonetousehere,however.Underthenullhypothesisofhomogeneity,bothChamberlain’sconditionalmaximumlikelihoodestimator(CMLE)andtheusualmaximumlikelihoodestimatorareconsistent,butChamberlain’sisinefficient.(Itfailstousetheinformationthatαi=α,anditmaynotuseallthedata.)Underthealternativehypothesis,theunconditionalmaximumlike-lihoodestimatorisinconsistent,34whereasChamberlain’sestimatorisconsistentandefficient.TheHausmantestcanbebasedonthechi-squaredstatisticχ2=(βˆ−βˆ)(Var[CML]−Var[ML])−1(βˆ−βˆ).CMLMLCMLMLTheestimatedcovariancematricesarethosecomputedforthetwomaximumlikelihoodestimators.Fortheunconditionalmaximumlikelihoodestimator,therowandcolumncorrespondingtotheconstanttermaredropped.Alargevaluewillcastdoubtonthehypothesisofhomogeneity.(ThereareKdegreesoffreedomforthetest.)Itispossiblethatthecovariancematrixforthemaximumlikelihoodestimatorwillbelargerthanthatfortheconditionalmaximumlikelihoodestimator.Ifso,thenthedifferencematrixinbracketsisassumedtobeazeromatrix,andthechi-squaredstatisticisthereforezero.33Thisproducesadifficultyforthisestimatorthatissharedbythesemiparametricestimatorsdiscussedinthenextsection.Sincethefixedeffectsarenotestimated,itisnotpossibletocomputeprobabilitiesormarginaleffectswiththeseestimatedcoefficients,anditisabitambiguouswhatonecandowiththeresultsofthecomputations.Thebruteforceestimatorthatactuallycomputestheindividualeffectsmightbepreferable.34Hsaio(1996)derivestheresultexplicitlyforsomeparticularcases.\nGreene-50240bookJune27,200222:39700CHAPTER21✦ModelsforDiscreteChoiceExample21.6IndividualEffectsinaBinaryChoiceModelToillustratethefixedandrandomeffectsestimators,wecontinuetheanalysesofExam-ples16.5and17.10.35Thebinarydependentvariableisyit=1iffirmirealizedaproductinnovationinyeartand0ifnot.Thesampleconsistsof1,270Germanfirmsobservedfor5years,1984–1988.Independentvariablesinthemodelthatweformulatedwerexit1=constant,xit2=logofsales,xit3=relativesize=ratioofemploymentinbusinessunittoemploymentintheindustry,xit4=ratioofindustryimportsto(industrysales+imports),xit5=ratioofindustryforeigndirectinvestmentto(industrysales+imports),xit6=productivity=ratioofindustryvalueaddedtoindustryindustryemployment,LatentclassandrandomparametersmodelswerefittothesedatainExamples16.5and17.10.(Forthisexample,wehavedroppedthetwosectordummyvariablesastheyareconstantacrossperiods.Thisprecludesestimationofthefixedeffectsmodels.)Table21.4presentsestimatesoftheprobitandlogitmodelswithindividualeffects.Thedifferencesacrossthemodelsarequitelarge.Note,forexample,thatthesignsofthesalesandFDIvariables,bothofwhicharehighlysignificantinthebasecase,changesigninthefixedeffectsmodel.(Therandomeffectslogitmodelisestimatedbyappendinganormallydis-tributedindividualeffecttothemodelandusingtheButlerandMoffittmethoddescribedearlier.)Theevidenceofheterogeneityinthedataisquitesubstantial.Thesimplelikelihoodratiotestsofeitherpaneldataformagainstthebasecaseleadstorejectionoftherestrictedmodel.(Thefixedeffectslogitmodelcannotbeusedforthistestbecauseitisbasedontheconditionalloglikelihoodwhereastheothertwoformsarebasedonunconditionallikelihoods.Itwasnotpossibletofitthelogitmodelwiththefullsetoffixedeffects.Therelativesizevariablehassome,butnotenoughwithingroupvariation,andthemodelbecameunstableafteronlyafewiterations.)TheHausmanstatisticbasedonthelogitestimatesequals19.59.The95percentcriticalvaluefromthechi-squareddistributionwith5degreesoffreedomis11.07,sobasedonthelogitestimates,wewouldrejectthehomogeneityrestriction.Inthissetting,unlikeinthelinearmodel(seeSection13.4.4),neithertheprobitnorthelogitmodelprovidesameansoftestingforwhethertherandomorfixedeffectsmodelispreferred.21.5.2SEMIPARAMETRICANALYSISInhissurveyofqualitativeresponsemodels,Amemiya(1981)reportsthefollowingwidelycitedapproximationsforthelinearprobability(LP)model:Overtherangeofprobabilitiesof30to70percent,βˆLP≈0.4βprobitfortheslopes,βˆ≈0.25βfortheslopes.36LPlogit35ThedataarefrombyBertschekandLechner(1998).DescriptionofthedataappearsinExample16.5andintheoriginalpaper.36Anadditional0.5isaddedfortheconstantterminbothmodels.\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice701TABLE21.4EstimatedPanelDataModels.(StandardErrorsinParentheses;MarginalEffectsinBrackets.)ProbitLogitBaseRandomFixedBaseRandomFixedConstant−2.35−3.51—−3.83−0.751—(0.214)(0.502)(0.351)(0.611)InSales0.2430.353−0.6500.4080.429−0.863(0.194)(0.448)(0.355)(0.0323)(0.547)(0.530)[0.094][0.088][−0.255][0.097][0.103]RelSize1.171.590.2782.161.360.340(0.141)(0.241)(0.734)(0.272)(0.296)(1.06)[0.450][0.398][0.110][0.517][0.328]Imports0.9091.403.501.490.8584.69(0.143)(0.343)(2.92)(0.232)(0.418)(4.34)[0.350][0.351][1.38][0.356][0.207]FDI3.394.55−8.135.751.98−10.44(0.394)(0.828)(3.38)(0.705)(1.01)(5.01)[1.31][1.14][−3.20][1.37][0.477]Prod−4.71−5.625.30−9.33−1.766.64(0.553)(0.753)(4.03)(1.13)(0.927)(5.93)[−1.82][−1.41][2.09][−2.29][−0.424]ρ—0.582—0.252—(0.019)(0.081)LnL−4134.86−3546.01−2086.26−4128.98−3545.84−1388.51Asidefromconfirmingourintuitionthatleastsquaresapproximatesthenonlinearmodelandprovidingaquickcomparisonforthethreemodelsinvolved,thepracticalusefulnessoftheformulaissomewhatlimited.Still,itisastrikingresult.37Aseriesofstudieshasfocusedonreasonswhytheleastsquaresestimatesshouldbeproportionaltotheprobitandlogitestimates.Arelatedquestionconcernstheproblemsassociatedwithassumingthataprobitmodelapplieswhen,infact,alogitmodelisappropriateorviceversa.38Theapproximationwouldseemtosuggestthatwiththistypeofmisspeci-fication,wewouldonceagainobtainascaledversionofthecorrectcoefficientvector.(Amemiyaalsoreportsthewidelyobservedrelationshipβˆlogit=1.6βˆprobit,whichfol-lowsfromtheresultsabove.)Greene(1983),buildingonGoldberger(1981),findsthatiftheprobitmodeliscorrectlyspecifiedandiftheregressorsarethemselvesjointnormallydistributed,thentheprobabilitylimitoftheleastsquaresestimatorisamultipleofthetruecoefficient37Thisresultdoesnotimplythatitisusefultoreport2.5timesthelinearprobabilityestimateswiththeprobitestimatesforcomparability.Thelinearprobabilityestimatesarealreadyintheformofmarginaleffects,whereastheprobitcoefficientsmustbescaleddownward.Ifthesampleproportionhappenstobecloseto0.5,thentherightscalefactorwillberoughlyφ[−1(0.5)]=0.3989.ButthedensityfallsrapidlyasPmovesawayfrom0.5.38SeeRuud(1986)andGourierouxetal.(1987).\nGreene-50240bookJune27,200222:39702CHAPTER21✦ModelsforDiscreteChoicevector.39Greene’sresultisusefulonlyforthesamepurposeasAmemiya’squickcorrectionofOLS.Multivariatenormalityisobviouslyinconsistentwithmostappli-cations.Forexample,nearlyallapplicationsincludeatleastonedummyvariable.Ruud(1982)andCheungandGoldberger(1984),however,haveshownthatmuchweakerconditionsthanjointnormalitywillproducethesameproportionalityresult.Forapro-bitmodel,CheungandGoldbergerrequireonlythatE[x|y∗]belineariny∗.Severalauthorshavebuiltontheseobservationstopursuetheissueofwhatcircumstanceswillleadtoproportionalityresultssuchasthese.Ruud(1986)andStoker(1986)haveex-tendedthemtoaverywideclassofmodelsthatgoeswellbeyondthoseofCheungandGoldberger.Curiouslyenough,Stoker’sresultsruleoutdummyvariables,butitisthoseforwhichtheproportionalityresultseemstobemostrobust.4021.5.3THEMAXIMUMSCOREESTIMATOR(MSCORE)InSection21.4.5,wediscussedtheissueofpredictionrulesfortheprobitandlogitmodels.Incontrasttothelinearregressionmodel,estimationofthesebinarychoicemodelsisnotbasedonafittingrule,suchasthesumofsquaredresiduals,whichisrelatedtothefitofthemodeltothedata.Themaximumscoreestimatorisbasedonafittingrule,1nMaximizeS(β)=[z−(1−2α)]sgn(xβ).41βnαiini=1Theparameterαisapresetquantile,andzi=2yi−1.(Soz=−1ify=0.)Ifαissetto1,thenthemaximumscoreestimatorchoosestheβtomaximizethenumberof2timesthatthepredictionhasthesamesignasz.Thisresultmatchesourpredictionrulein(21-36)withF∗=0.5.Soforα=0.5,maximumscoreattemptstomaximizethenumberofcorrectpredictions.Sincethesignofxβisthesameforallpositivemultiplesofβ,theestimatoriscomputedsubjecttotheconstraintthatββ=1.Sincethereisnolog-likelihoodfunctionunderlyingthefittingcriterion,thereisnoinformationmatrixtoprovideamethodofobtainingstandarderrorsfortheestimates.Bootstrappingcanusedtoprovideatleastsomeideaofthesamplingvariabilityoftheestimator.(SeeSectionE.4.)Themethodproceedsasfollows.Afterthesetofcoefficientsbniscomputed,Rrandomlydrawnsamplesofmobservationsaredrawnfromtheoriginaldatasetwithreplacement.Thebootstrapsamplesizemmaybelessthanorequalton,thesamplesize.Witheachsuchsample,themaximumscoreestimatorisrecomputed,givingbm(r).Thenthemean-squareddeviationmatrix1RMSD(b)=[b(r)−b][b(r)−b]mnmnRb=139Thescalefactorisestimablewiththesampledata,soundertheseassumptions,amethodofmomentsestimatorisavailable.40SeeGreene(1983).41SeeManski(1975,1985,1986)andManskiandThompson(1986).Forextensionsofthismodel,seeHorowitz(1992),Charlier,MelenbergandvanSoest(1995),Kyriazidou(1997)andLee(1996).\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice703TABLE21.5MaximumScoreEstimatorMaximumScoreProbitEstimateMeanSquareDev.EstimateStandardErrorConstantβ1−0.93170.1066−7.45222.5420GPAβ20.35820.21521.62600.6939TUCEβ3−0.015130.028000.051730.08389PSIβ40.059020.27491.42640.5950FittedFitted0101Actual0210Actual0183147138iscomputed.Theauthorsofthetechniqueemphasizethatthismatrixisnotacovariancematrix.42Example21.7TheMaximumScoreEstimatorTable21.5presentsmaximumscoreestimatesforSpectorandMazzeo’sGRADEmodelusingα=0.5.Notethattheyarequitefarremovedfromtheprobitestimates.(Theestimatesareextremelysensitivetothechoiceofα.)Ofcourse,thereisnomeaningfulcomparisonofthecoefficients,sincethemaximumscoreestimatesarenottheslopesofaconditionalmeanfunction.Thepredictionperformanceofthemodelisalsoquitesensitivetoα,butthatistobeexpected.43Asexpected,themaximumscoreestimatorperformsbetterthantheprobitestimator.Thescoreispreciselythenumberofcorrectpredictionsinthe2×2table,sothebestthattheprobitmodelcouldpossiblydoisobtainthe“maximumscore.”Inthisexample,itdoesnotquiteattainthatmaximum.[Theliteratureawaitsacomparisonofthepredictionperformanceoftheprobit/logit(parametric)approachesandthissemiparametricmodel.]Therelevantscoresforthetwoestimatorsarealsogiveninthetable.Semiparametricapproachessuchasthisonehavethevirtuethattheydonotmakeapossiblyerroneousassumptionabouttheunderlyingdistribution.Ontheotherhand,asseenintheexample,thereisnoguaranteethattheestimatorwilloutperformthefullyparametricestimator.Oneadditionalpracticalconsiderationisthatsemiparametricestimatorssuchasthisoneareverycomputationintensive.Atpresent,themaximumscoreestimatorisnotusableformorethanroughly15coefficientsandperhaps1,500to2,000observations.44Athirdshortcomingoftheapproachis,unfortunately,inherentin42Notethatwearenotyetagreedthatbnevenconvergestoameaningfulvector,sincenounderlyingproba-bilitydistributionassuchhasbeenassumed.Onceitisagreedthatthereisanunderlyingregressionfunctionatwork,thenameaningfulsetofasymptoticresults,includingconsistency,canbedeveloped.ManskiandThompson(1986)andKimandPollard(1990)presentanumberofresults.Evenso,ithasbeenshownthatthebootstrapMSDmatrixisusefulforlittlemorethandescriptivepurposes.Horowitz’s(1993)smoothedmaximumscoreestimatorreplacesthediscontinuoussgn(βxi)intheMSCOREcriterionwithacontinuousx−1/5weightingfunction,(βi/h),wherehisabandwidthproportionalton.HearguesthatthisestimatorisanimprovementoverManski’sMSCOREestimator.(“Itsasymptoticdistributionisverycomplicatedandnotusefulformakinginferencesinapplications.”Laterinthesameparagraphheargues,“Therehasbeennotheoreticalinvestigationofthepropertiesofthebootstrapinmaximumscoreestimation.”)43Thecriterionfunctionforchoosingbisnotcontinuous,andithasmorethanoneoptimum.M.E.Bisseyreportedfindingthatthescorefunctionvariessignificantlybetweenthelocaloptimaaswell.[Personalcorrespondencetotheauthor,UniversityofYork(1995).]44CommunicationfromC.Manskitotheauthor.ThemaximumscoreestimatorhasbeenimplementedbyManskiandThompson(1986)andGreene(1995a).\nGreene-50240bookJune27,200222:39704CHAPTER21✦ModelsforDiscreteChoiceitsdesign.Theparametricassumptionsoftheprobitorlogitproducealargeamountofinformationabouttherelationshipbetweentheresponsevariableandthecovariates.Inthefinalanalysis,themarginaleffectsdiscussedearliermightwellhavebeentheprimaryobjectiveofthestudy.Thatinformationislosthere.21.5.4SEMIPARAMETRICESTIMATIONThefullyparametricprobitandlogitmodelsremainbyfarthemainstaysofempiricalresearchonbinarychoice.Fullynonparametricdiscretechoicemodelsarefairlyexoticandhavemadeonlylimitedinroadsintheliterature,andmuchofthatliteratureistheoretical[e.g.,Matzkin(1993)].Theprimaryobstacletoapplicationistheirpaucityofinterpretableresults.(SeeExample21.9.)Ofcourse,onecouldargueonthisbasisthatthefirmresultsproducedbythefullyparametricmodelsaremerelyfragileartifactsofthedetailedspecification,notgenuinereflectionsofsomeunderlyingtruth.[Inthisconnection,seeManski(1995).]Butthatorthodoxviewraisesthequestionofwhatmotivatesthestudytobeginwithandwhatonehopestolearnbyembarkinguponit.Theintentofmodelbuildingtoapproximaterealitysoastodrawusefulconclusionsishardlylimitedtotheanalysisofbinarychoices.Semiparametricestimatorsrepresentamiddlegroundbetweentheseextremeviews.45ThesingleindexmodelofKleinandSpady(1993)hasbeenusedinseveralapplications,includingGerfin(1996),Horowitz(1993),andFernandezandRodriguez-Poo(1997).46Thesingleindexformulationdepartsfromalinear“regression”formulation,E[y|x]=E[y|xβ].iiiiThenProb(y=1|x)=F(xβ|x)=G(xβ),iiiiiwhereGisanunknowncontinuousdistributionfunctionwhoserangeis[0,1].ThefunctionGisnotspecifiedapriori;itisestimatedalongwiththeparameters.(SinceGaswellasβistobeestimated,aconstanttermisnotidentified;essentially,Gprovidesthelocationfortheindexthatwouldotherwisebeprovidedbyaconstant.)Thecriterionfunctionforestimation,inwhichsubscriptsndenoteestimatorsoftheirunsubscriptedcounterparts,is1nlnL=ylnG(xβ)+(1−y)ln[1−G(xβ)].nininininni=1Theestimatoroftheprobabilityfunction,Gn,iscomputedateachiterationusinganonparametrickernelestimatorofthedensityofxβ;wedidthiscalculationinnSection16.4.FortheKleinandSpadyestimator,thenonparametricregression45RecentproposalsforsemiparametricestimatorsinadditiontotheonedevelopedhereincludeLewbel(1997,2000),LewbelandHonore(2001),andAltonjiandMatzkin(2001).Inspiteofnearly10yearsofdevelopment,thisisanascentliterature.Thetheoreticaldevelopmenttendstofocusonroot-nconsistentcoefficientestimationinmodelswhichprovidenomeansofcomputationofprobabilitiesormarginaleffects.46AsymposiumonthesubjectisHardleandManski(1993).\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice705estimatorisyg¯n(zi|yi=1)Gn(zi)=,yg¯n(zi|yi=1)+(1−y¯)gn(zi|yi=0)wheregn(zi|yi)isthekernelestimateofthedensityofzi=βnxi.Thisresultisn1zi−βnxjgn(zi|yi=1)=yjK;nyh¯nhnj=1gn(zi|yi=0)isobtainedbyreplacingy¯with1−y¯intheleadingscalarandyjwith1−yjinthesummation.Asbefore,hnisthebandwidth.Thereisnofirmtheoryforchoosingthekernelfunctionorthebandwidth.BothHorowitzandGerfinusedthestandardnormaldensity.Twodifferentmethodsforchoosingthebandwidtharesuggestedbythem.47KleinandSpadyprovidetheoreticalbackgroundforcomputingasymptoticstandarderrors.Example21.8AComparisonofBinaryChoiceEstimatorsGerfin(1996)didanextensiveanalysisofseveralbinarychoiceestimators,theprobitmodel,KleinandSpady’ssingleindexmodel,andHorowitz’ssmoothedmaximumscoreestimator.(Afourth“seminonparametric”estimatorwasalsoexamined,butintheinterestofbrevity,weconfineourattentiontothethreemorewidelyusedprocedures.)Theseveralmodelswereallfittotwodatasetsonlaborforceparticipationofmarriedwomen,onefromSwitzerlandandonefromGermany.Variablesincludedintheequationwere(ournotation),x1=aconstant,x=age,x=age2,x=education,x=numberofyoungchildren,x=numberofolder23456children,x7=logofyearlynonlaborincome,andx8=adummyvariableforpermanentfor-eignresident(Swissdataonly).Coefficientestimatesforthemodelsarenotdirectlycompa-rable.WesuggestedinExample21.3thattheycouldbemadecomparablebytransformingthemtomarginaleffects.NeitherMSCOREnorthesingleindexmodel,however,producesamarginaleffect(whichdoessuggestaquestionofinterpretation).Theauthorobtainedcom-parabilitybydividingallcoefficientsbytheabsolutevalueofthecoefficientonx7.ThesetofnormalizedcoefficientsestimatedfortheSwissdataappearsinTable21.6,withestimatedstandarderrors(fromGerfin’sTableIII)showninparentheses.Giventheverylargedifferencesinthemodels,theagreementoftheestimatesisimpres-sive.[AsimilarcomparisonofthesameestimatorswithcomparableconcordancemaybefoundinHorowitz(1993,p.56).]Ineverycase,thestandarderroroftheprobitestimatorissmallerthanthatoftheothers.Itistemptingtoconcludethatitisamoreefficientestimator,butthatistrueonlyifthenormaldistributionassumedforthemodeliscorrect.Inanyevent,thesmallerstandarderroristhepayofftothesharperspecificationofthedistribution.Thispayoffcouldbeviewedinmuchthesamewaythatparametricrestrictionsintheclassicalregressionmaketheasymptoticcovariancematrixoftherestrictedleastsquaresestimatorsmallerthanitsunrestrictedcounterpart,eveniftherestrictionsareincorrect.GerfinthenproducedplotsofF(z)forzintherangeofthesamplevaluesofbx.Onceagain,thefunctionsaresurprisinglyclose.IntheGermandata,however,theKlein–Spadyestimatorisnonmonotonicoverasizeablerange,whichwouldcausesomedifficultproblemsofinterpretation.Themaximumscoreestimatordoesnotproduceanestimateoftheproba-bility,soitisexcludedfromthiscomparison.Anothercomparisonisbasedonthepredictionsoftheobservedresponse.Twoapproachesaretried,firstcountingthenumberofcasesinwhichthepredictedprobabilityexceeds0.5.(bx>0forMSCORE)andsecondbysummingthesamplevaluesofF(bx).(Onceagain,MSCOREisexcluded.)Bythesecondapproach,47ThefunctionGn(z)involvesanenormousamountofcomputation,ontheorderofn2,inprinciple.AsGerfin(1996)observes,however,computationofthekernelestimatorcanbecastasaFouriertransform,forwhichthefastFouriertransformreducestheamountofcomputationtotheorderofnlog2n.Thisvalueisonlyslightlylargerthanlinearinn.SeePressetal.(1986)andGerfin(1996).\nGreene-50240bookJune27,200222:39706CHAPTER21✦ModelsforDiscreteChoiceTABLE21.6EstimatedParametersforSemiparametricModelsx1x2x3x4x5x6x7x8hProbit5.623.11−0.440.03−1.07−0.22−1.001.07—(1.35)(0.77)(0.10)(0.03)(0.26)(0.09)—(0.29)Single—2.98−0.440.02−1.32−0.25−1.001.060.40index—(0.90)(0.12)(0.03)(0.33)(0.11)—(0.32)MSCORE5.832.84−0.400.03−0.80−0.16−1.000.910.70(1.78)(0.98)(0.13)(0.05)(0.43)(0.20)—(0.57)theestimatorsarealmostindistinguishable,buttheresultsforthefirstdifferwidely.Of401ones(outof873observations),thecountsofpredictedonesare389forprobit,382forKlein/Spady,and355forMSCORE.(Theresultsdonotindicatehowmanyofthesecountsarecorrectpredictions.)21.5.5AKERNELESTIMATORFORANONPARAMETRICREGRESSIONFUNCTIONAsnoted,oneunsatisfactoryaspectofsemiparametricformulationssuchasMSCOREisthattheamountofinformationthattheprocedureprovidesaboutthepopulationislimited;thisaspectis,afterall,thepurposeofdispensingwiththefirm(parametric)assumptionsoftheprobitandlogitmodels.Thus,intheprecedingexample,thereislittlethatonecansayaboutthepopulationthatgeneratedthedatabasedontheMSCORE“estimates”inthetable.Theestimatesdoallowpredictionsoftheresponsevariable.Butthereislittleinformationaboutanyrelationshipbetweentheresponseandtheinde-pendentvariablesbasedonthe“estimation”results.Eventhemean-squareddeviationmatrixissuspectasanestimatoroftheasymptoticcovariancematrixoftheMSCOREcoefficients.Theauthorsofthetechniquehaveproposedasecondaryanalysisoftheresults.LetF(z)=E[y|xβ=z]βiiiidenoteasmoothregressionfunctionfortheresponsevariable.Basedonaparametervectorβ,theauthorsproposetoestimatetheregressionbythemethodofkernelsasfollows.Forthenobservationsinthesampleandforthegivenβ(e.g.,bnfromMSCORE),letz=xβ,iin1/21s=(z−z¯)2.ini=1Foraparticularvaluez∗,wecomputeasetofnweightsusingthekernelfunction,w(z∗)=K[(z∗−z)/(λs)],iiwhereK(ri)=P(ri)[1−P(ri)]\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice707andP(r)=[1+exp(−cr)]−1.ii√Theconstantc=(π/3)−1≈0.55133isusedtostandardizethelogisticdistributionthatisusedforthekernelfunction.(SeeSection16.4.1.)Theparameterλisthesmoothing(bandwidth)parameter.Largevalueswillflattentheestimatedfunctionthroughy¯,whereasvaluesclosetozerowillallowgreatervariationinthefunctionbutmightcauseittobeunstable.Thereisnogoodtheoryforthechoice,butsomesuggestionshavebeenmadebasedondescriptivestatistics.[SeeWong(1983)andManski(1986).]Finally,thefunctionvalueisestimatedwithn∗∗i=1wi(z)yiF(z)≈n.wi(z∗)i=1Example21.9NonparametricRegressionFigure21.3showsaplotoftwoestimatesoftheregressionfunctionforE[GRADE|z].ThecoefficientsaretheMSCOREestimatesgiveninTable21.5.Theplotisproducedbycom-putingfittedvaluesfor100equallyspacedpointsintherangeofxb,whichforthesedatanandcoefficientsis[−0.66229,0.05505].Thefunctionisestimatedwithtwovaluesofthesmoothingparameter,1.0and0.3.Asexpected,thefunctionbasedonλ=1.0ismuchflatterthanthatbasedonλ=0.3.Clearly,theresultsoftheanalysisarecruciallydependentonthevalueassumed.ThenonparametricestimatordisplaysarelationshipbetweenxβandE[y].Atfirstiblush,thisrelationshipmightsuggestthatwecoulddeducethemarginaleffects,butunfortunately,thatisnotthecase.Thecoefficientsinthissettingarenotmeaningful,soallwecandeduceisanestimateofthedensity,f(z),byusingfirstdifferencesoftheestimatedregressionfunction.Itmightseem,therefore,thattheanalysishasproducedFIGURE21.3NonparametricRegression.0.720.640.560.48)0.40x(1F0.320.240.160.30.080.000.700.600.500.400.300.200.100.000.10x\nGreene-50240bookJune27,200222:39708CHAPTER21✦ModelsforDiscreteChoicerelativelylittlepayofffortheeffort.Butthatshouldcomeasnosurpriseifwereconsidertheassumptionswehavemadetoreachthispoint.Theonlyassumptionsmadethusfararethatforagivenvectorofcovariatesxiandcoefficientvectorβ(thatis,anyβ),thereexistsasmoothfunctionF(xβ)=E[y|z].Wehavealsoassumed,atleastim-iiplicitly,thatthecoefficientscarrysomeinformationaboutthecovariationofxβandtheresponsevariable.Thetechniquewillapproximateanysuchfunction[seeManski(1986)].Thereisalargeandburgeoningliteratureonkernelestimationandnonparametricestimationineconometrics.[ArecentapplicationisMelenbergandvanSoest(1996).]Asthissimpleexamplesuggests,withtheradicallydifferentformsofthespecifiedmodel,theinformationthatisculledfromthedatachangesradicallyaswell.Thegeneralprin-ciplenowmadeevidentisthatthefewerassumptionsonemakesaboutthepopulation,thelessprecisetheinformationthatcanbededucedbystatisticaltechniques.Thattradeoffisinherentinthemethodology.21.5.6DYNAMICBINARYCHOICEMODELSArandomorfixedeffectsmodelwhichexplicitlyallowsforlaggedeffectswouldbey=1(xβ+α+γy+ε>0).ititii,t−1itLaggedeffects,orpersistence,inabinarychoicesettingcanarisefromthreesources,serialcorrelationinεit,theheterogeneity,αi,ortruestatedependencethroughthetermγyi,t−1.Chiappori(1998)[andseeArellano(2001)]suggestsanapplicationtotheFrenchautomobileinsurancemarketinwhichtheincentivesbuiltintothepricingsystemaresuchthathavinganaccidentinoneperiodshouldlowertheprobabilityofhavingoneinthenext(statedependence),but,somedriversremainmorelikelytohaveaccidentsthanothersineveryperiod,whichwouldreflecttheheterogeneityinstead.Statedependenceislikelytobeparticularlyimportantinthetypicalpanelwhichhasonlyafewobservationsforeachindividual.Heckman(1981a)examinedthisissueatlength.AmonghisfindingswerethatthesomewhatmutedsmallsamplebiasinfixedeffectsmodelswithT=8wasmademuchworsewhentherewasstatedependence.Arelatedproblemisthatwitharelativelyshortpanel,theinitialconditions,yi0,haveacrucialimpactontheentirepathofoutcomes.Modelingdynamiceffectsandinitialconditionsinbinarychoicemodelsismorecomplexthaninthelinearmodel,andbycomparisontherearerelativelyfewerfirmresultsintheappliedliterature.Muchofthecontemporaryliteraturehasfocusedonmethodsofavoidingthestrongparametricassumptionsoftheprobitandlogitmodels.Manski(1987)andHonoreandKyriadizou(2000)showthatManski’s(1986)maximumscoreestimatorcanbeappliedtothedifferencesofunequalpairsofobservationsinatwoperiodpanelwithfixedeffects.However,thelimitationsofthemaximumscoreestimatornotedearlierhavemotivatedresearchonotherapproaches.AnextensionoflaggedeffectstoaparametricmodelisChamberlain(1985),JonesandLandwehr(1988)andMagnac(1997)whoaddedstatedependencetoChamberlain’sfixedeffectslogitestimator.Unfortunately,oncetheidentificationissuesaresettled,themodelisonlyoperationaliftherearenootherexogenousvariablesinit,whichlimitsisusefulnessforpracticalapplication.Lewbel(2000)hasextendedhisfixedeffectsestimatortodynamicmodelsaswell.Inthisframework,thenarrowassumptionsabouttheindependentvariablessomewhat\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice709limititspracticalapplicability.HonoreandKyriazidou(2000)havecombinedthelogicoftheconditionallogitmodelandManski’smaximumscoreestimator.TheyspecifyProb(yi0=1|xi,αi)=p0(xi,αi)wherexi=(xi1,xi2,...,xiT)Prob(y=1|x,α,y,y,...,y)=F(xβ+α+γy)t=1,...,Titiii0i1i,t−1itii,t−1TheanalysisassumesasingleregressorandfocusesonthecaseofT=3.TheresultingestimatorresemblesChamberlain’sbutreliesonobservationsforwhichxit=xi,t−1whichrulesoutdirecttimeeffectsaswellas,forpracticalpurposes,anycontinuousvariable.Therestrictiontoasingleregressorlimitsthegeneralityofthetechniqueaswell.Theneedforobservationswithequalvaluesofxitisaconsiderablerestriction,andtheauthorsproposeakerneldensityestimatorforthedifference,xit−xi,t−1,insteadwhichdoesrelaxthatrestrictionabit.Theendresultisanestimatorwhichconverges(theyconjecture)buttoanonnormaldistributionandatarateslowerthann−1/3.Semiparametricestimatorsfordynamicmodelsatthispointinthedevelopmentarestillprimarilyoftheoreticalinterest.Modelsthatextendtheparametricformulationstoincludestatedependencehaveamuchlongerhistory,includingHeckman(1978,1981a,1981b),HeckmanandMaCurdy(1980),Jakubson(1988),Keane(1993)andBecketal.(2001)tonameafew.48Ingeneral,evenwithoutheterogeneity,dynamicmodelsultimatelyinvolvemodelingthejointoutcome(yi0,...,yiT)whichnecessitatessometreatmentinvolvingmultivariateintegration.Example21.10describesarecentapplication.Example21.10AnIntertemporalLaborForceParticipationEquationHyslop(1999)presentsamodelofthelaborforceparticipationofmarriedwomen.Thefocusofthestudyisthehighdegreeofpersistenceintheparticipationdecision.Datausedinthestudyweretheyears1979–1985ofthePanelStudyofIncomeDynamics.Asampleof1812continuouslymarriedcoupleswerestudied.Exogenousvariableswhichappearedinthemodelweremeasuresofpermanentandtransitoryincomeandfertilitycapturedinyearlycountsofthenumberofchildrenfrom0–2,3–5and6–17yearsold.Hyslop’sformulation,ingeneralterms,is(initialcondition)yi0=1(xi0β0+vi0>0),(dynamicmodel)yit=1(xitβ+γyi,t−1+αi+vit>0)(heterogeneitycorrelatedwithparticipation)αi=ziδ+ηi,(Stochasticspecification)η|X∼N0,σ2,iiηv|X∼N0,σ2,i0i0w|X∼N0,σ2,itiwv=ρv+w,σ2+σ2=1.iti,t−1itηwCorr[v,v]=ρt,t=1,...,T−1.i0it48Becketal.(2001)isabitdifferentfromtheothersmentionedinthatintheirstudyof“statefailure,”theyobservealargesampleofcountries(147)observedoverafairlylargenumberofyears,40.Assuch,theyareabletoformulatetheirmodelsinawaythatmakestheasymptoticswithrespecttoTappropriate.Theycananalyzethedataessentiallyinatimeseriesframework.Sepanski(2000)isanotherapplicationwhichcombinesstatedependenceandtherandomcoefficientspecificationofAkin,Guilkey,andSickles(1979).\nGreene-50240bookJune27,200222:39710CHAPTER21✦ModelsforDiscreteChoiceThepresenceoftheautocorrelationandstatedependenceinthemodelinvalidatethesim-plemaximumlikelihoodprocedureswehaveexaminedearlier.TheappropriatelikelihoodfunctionisconstructedbyformulatingtheprobabilitiesasProb(yi0,yi1,...)=Prob(yi0)×Prob(yi1|yi0)×···×Prob(yiT|yi,T−1)ThisstillinvolvesaT=7ordernormalintegration,whichisapproximatedinthestudyusingasimulatorsimilartotheGHKsimulatordiscussedinE.4.2e.AmongHyslop’sresultsareacomparisonofthemodelfitbythesimulatorforthemultivariatenormalprobabilitieswiththesamemodelfitusingthemaximumsimulatedlikelihoodtechniquedescribedinSection17.8.21.6BIVARIATEANDMULTIVARIATEPROBITMODELSInChapter14,weanalyzedanumberofdifferentmultiple-equationextensionsoftheclassicalandgeneralizedregressionmodel.Anaturalextensionoftheprobitmodelwouldbetoallowmorethanoneequation,withcorrelateddisturbances,inthesamespiritastheseeminglyunrelatedregressionsmodel.Thegeneralspecificationforatwo-equationmodelwouldbey∗=xβ+ε,y=1ify∗>0,0otherwise,111111y∗=xβ+ε,y=1ify∗>0,0otherwise,222222E[ε1|x1,x2]=E[ε2|x1,x2]=0,(21-41)Var[ε1|x1,x2]=Var[ε2|x1,x2]=1,Cov[ε1,ε2|x1,x2]=ρ.21.6.1MAXIMUMLIKELIHOODESTIMATIONThebivariatenormalcdfisx2x1Prob(X11,y2<3,y3<−1)|x1,x2,ρ12,ρ13,ρ23),wewouldsimplydrawrandomob-servationsfromthistrivariatenormaldistribution(seeSectionE.5.6.)andcountthenumberofobservationsthatsatisfytheinequality.Toobtainanaccurateestimateoftheprobability,quitealargenumberofdrawsisrequired.Also,thesubstantivepossibilityofgettingzerosuchdrawsinafinitenumberofdrawsisproblematic.Nonetheless,thelogicoftheLerman–Manskiapproachissound.AsdiscussedinSectionE.5.6recentdevelopmentshaveproducedmethodsofproducingquiteaccurateestimatesofmulti-variatenormalintegralsbasedonthisprinciple.Theevaluationofmultivariatenormalintegralisgenerallyamuchlessformidableobstacletotheestimationofmodelsbasedonthemultivariatenormaldistribution.55McFadden(1989)pointedoutthatforpurposesofmaximumlikelihoodestimation,accurateevaluationofprobabilitiesisnotnecessarilytheproblemthatneedstobesolved.Onecanviewthecomputationofthelog-likelihoodanditsderivativesasaproblemofestimatingamean.Thatis,in(21-41)and(21-42),thesameproblemarisesifwedividebyn.Theideaisthateventhoughtheindividualtermsintheaveragemightbeinerror,iftheerrorhasmeanzero,thenitwillaverageoutinthesummation.Theimportantinsight,then,isthatifwecanobtainprobabilityestimatesthatonlyerrrandomlybothpositivelyandnegatively,thenitmaybepossibletoobtainanestimateofthelog-likelihoodanditsderivativesthatisreasonablyclosetotheonethatwould53ThemodelwasfirstproposedbyWynandandvanPraag(1981).54ExtensionsofthebivariateprobitmodeltoothertypesofcensoringarediscussedinPoirier(1980)andAbowdandFarber(1982).55PapersthatproposeimprovedmethodsofsimulatingprobabilitiesincludePakesandPollard(1989)andespeciallyBorsch-SupanandHajivassilou(1990),Geweke(1989),andKeane(1994).Asymposiuminthe¨November1994issueofReviewofEconomicsandStatisticspresentsdiscussionofnumerousissuesinspeci-ficationandestimationofmodelsbasedonsimulationofprobabilities.Applicationsthatemploysimulationtechniquesforevaluationofmultivariatenormalintegralsarenowfairlynumerous.See,forexample,Hyslop(1999)(Example21.10)whoappliesthetechniquetoapaneldataapplicationwithT=7.\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice715resultfromactuallycomputingtheintegral.Fromapracticalstandpoint,itdoesnottakeinordinatelylargenumbersofrandomdrawstoachievethisresult,whichwiththeprogressthathasbeenmadeonMonteCarlointegration,hasmadefeasiblemultivariatemodelsthatpreviouslywereintractable.Themultivariateprobitmodelinanotherformpresentsausefulextensionoftheprobitmodeltopaneldata.Thestructuralequationforthemodelwouldbey∗=xβ+ε,y=1ify∗>0,0otherwise,i=1,...,n;t=1,...,T.itititititTheButlerandMoffittapproachforthismodelhasprovedusefulinnumerousapplica-tions.But,theunderlyingassumptionthatCov[εit,εis]=ρisasubstantiverestriction.Bytreatingthisstructureasamultivariateprobitmodelwitharestrictionthatthecoef-ficientvectorbethesameineveryperiod,onecanobtainamodelwithfreecorrelationsacrossperiods.Hyslop(1999)andGreene(2002)aretwoapplications.21.6.6APPLICATION:GENDERECONOMICSCOURSESINLIBERALARTSCOLLEGESBurnett(1997)proposedthefollowingbivariateprobitmodelforthepresenceofagendereconomicscourseinthecurriculumofaliberalartscollege:Prob[y=1,y=1|x,x]=(xβ+γy,xβ,ρ).1212211222Thedependentvariablesinthemodelarey1=presenceofagendereconomicscourse,y2=presenceofawomen’sstudiesprogramonthecampus.Theindependentvariablesinthemodelarez1=constantterm;z2=academicreputationofthecollege,coded1(best),2,...to141;z3=sizeofthefulltimeeconomicsfaculty,acount;z4=percentageoftheeconomicsfacultythatarewomen,proportion(0to1);z5=religiousaffiliationofthecollege,0=no,1=yes;z6=percentageofthecollegefacultythatarewomen,proportion(0to1);z7–z10=regionaldummyvariables,south,midwest,northeast,west.Theregressorvectorsarex1=z1,z2,z3,z4,z5,x2=z2,z6,z5,z7–z10.Burnett’smodelillustratesanumberofinterestingaspectsofthebivariateprobitmodel.Notethatthismodelisqualitativelydifferentfromthebivariateprobitmodelin(21-41);theseconddependentvariable,y2,appearsontheright-handsideofthefirstequation.Thismodelisarecursive,simultaneous-equationsmodel.Surprisingly,theendogenousnatureofoneofthevariablesontheright-handsideofthefirstequationcanbeignoredinformulatingthelog-likelihood.[ThemodelappearsinMaddala(1983,p.123).]Wecanestablishthisfactwiththefollowing(admittedlytrivial)argument:Thetermthat\nGreene-50240bookJune27,200222:39716CHAPTER21✦ModelsforDiscreteChoiceentersthelog-likelihoodisP(y1=1,y2=1)=P(y1=1|y2=1)P(y2=1).Giventhemodelasstated,themarginalprobabilityforyisjust(xβ),whereastheconditional222probabilityis(...)/(xβ).Theproductreturnstheprobabilitywehadearlier.The222otherthreetermsinthelog-likelihoodarederivedsimilarly,whichproduces(Maddala’sresultswithsomesignchanges):P=(xβ+γy,xβ,ρ),P=(xβ,−xβ,−ρ)112112221021122P=[−(xβ+γy),βx,−ρ],P=(−xβ,−xβ,ρ).012112220021122Thesetermsareexactlythoseof(21-41)thatweobtainjustbycarryingy2inthefirstequationwithnospecialattentiontoitsendogenousnature.Wecanignorethesimul-taneityinthismodelandwecannotinthelinearregressionmodelbecause,inthisinstance,wearemaximizingthelog-likelihood,whereasinthelinearregressioncase,wearemanipulatingcertainsamplemomentsthatdonotconvergetothenecessarypopulationparametersinthepresenceofsimultaneity.NotethatthesameresultisatworkinSection15.6.2,wheretheFIMLestimatorofthesimultaneousequationsmodelisobtainedwiththeendogenousvariablesontheright-handsidesoftheequations,butnotbyusingordinaryleastsquares.Themarginaleffectsinthismodelarefairlyinvolved,andasbefore,wecanconsiderseveraldifferenttypes.Consider,forexample,z2,academicreputation.Thereisadirecteffectproducedbyitspresenceinthefirstequation,butthereisalsoanindirecteffect.Academicreputationentersthewomen’sstudiesequationand,therefore,influencestheprobabilitythaty2equalsone.Sincey2appearsinthefirstequation,thiseffectistransmittedbacktoy1.Thetotaleffectofacademicreputationand,likewise,religiousaffiliationisthesumofthesetwoparts.Considerfirstthegendereconomicsvariable,y1.TheconditionalmeanisE[y1|x1,x2]=Prob[y2=1]E[y1|y2=1,x1,x2]+Prob[y2=0]E[y1|y2=0,x1,x2]=(xβ+γy,xβ,ρ)+(xβ,−xβ,−ρ).21122221122Derivativescanbecomputedusingourearlierresults.Wearealsointerestedintheeffectofreligiousaffiliation.Sincethisvariableisbinary,simplydifferentiatingtheconditionalmeanfunctionmaynotproduceanaccurateresult.Instead,wewouldcomputetheconditionalmeanfunctionwiththisvariablesettooneandthenzero,andtakethedifference.Finally,whatistheeffectofthepresenceofawomen’sstudiesprogramontheprobabilitythatthecollegewillofferagendereconomicscourse?Tocomputethiseffect,wewouldcomputeProb[y1=1|y2=1,x1,x2]−Prob[y1=1|y2=0,x1,x2].Inallcases,standarderrorsfortheestimatedmarginaleffectscanbecomputedusingthedeltamethod.MaximumlikelihoodestimatesoftheparametersofBurnett’smodelwerecom-putedbyGreene(1998)usinghersampleof132liberalartscolleges;31oftheschoolsoffergendereconomics,58havewomen’sstudies,and29haveboth.Theestimatedpa-rametersaregiveninTable21.7.Bothbivariateprobitandthesingle-equationestimatesaregiven.Theestimateofρisonly0.1359,withastandarderrorof1.2359.TheWaldstatisticforthetestofthehypothesisthatρequalszerois(0.1359/1.2539)2=0.011753.Forasinglerestriction,thecriticalvaluefromthechi-squaredtableis3.84,sothehy-pothesiscannotberejected.Thelikelihoodratiostatisticforthesamehypothesisis\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice717TABLE21.7EstimatesofaRecursiveSimultaneousBivariateProbitModel(EstimatedStandardErrorsinParentheses)SingleEquationBivariateProbitVariableCoefficientStandardErrorCoefficientStandardErrorGenderEconomicsEquationConstant−1.4176(0.8069)−1.1911(2.2155)AcRep−0.01143(0.004081)−0.01233(0.007937)WomStud1.1095(0.5674)0.8835(2.2603)EconFac0.06730(0.06874)0.06769(0.06952)PctWecon2.5391(0.9869)2.5636(1.0144)Relig−0.3482(0.4984)−0.3741(0.5265)Women’sStudiesEquationAcRep−0.01957(0.005524)−0.01939(0.005704)PctWfac1.9429(0.8435)1.8914(0.8714)Relig−0.4494(0.3331)−0.4584(0.3403)South1.3597(0.6594)1.3471(0.6897)West2.3386(0.8104)2.3376(0.8611)North1.8867(0.8204)1.9009(0.8495)Midwest1.8248(0.8723)1.8070(0.8952)ρ0.0000(0.0000)0.1359(1.2539)LogL−85.6458−85.63172[−85.6317−(−85.6458)]=0.0282,whichleadstothesameconclusion.TheLagrangemultiplierstatisticis0.003807,whichisconsistent.Thisresultmightseemcounterintu-itive,giventhesetting.Surely“gendereconomics”and“women’sstudies”arehighlycorrelated,butthisfindingdoesnotcontradictthatproposition.Thecorrelationcoeffi-cientmeasuresthecorrelationbetweenthedisturbancesintheequations,theomittedfactors.Thatis,ρmeasures(roughly)thecorrelationbetweentheoutcomesaftertheinfluenceoftheincludedfactorsisaccountedfor.Thus,thevalue0.13measurestheeffectaftertheinfluenceofwomen’sstudiesisalreadyaccountedfor.Asdiscussedinthenextparagraph,thepropositionturnsouttoberight.Thesinglemostimportantdeterminant(atleastwithinthismodel)ofwhetheragendereconomicscoursewillbeofferedisindeedwhetherthecollegeoffersawomen’sstudiesprogram.Table21.8presentstheestimatesofthemarginaleffectsandsomedescriptivestatis-ticsforthedata.Thecalculationsweresimplifiedslightlybyusingtherestrictedmodelwithρ=0.Computationsofthemarginaleffectsstillrequirethedecompositionabove,buttheyaresimplifiedslightlybytheresultthatifρequalszero,thenthebivariateprobabilitiesfactorintotheproductsofthemarginals.Numerically,thestrongesteffectappearstobeexertedbytherepresentationofwomenonthefaculty;itscoefficientof+0.4491isbyfarthelargest.Thisvariable,however,cannotchangebyafullunitbecauseitisaproportion.Anincreaseof1percentinthepresenceofwomenonthefacultyraisestheprobabilitybyonly+0.004,whichiscomparableinscaletotheeffectofacademicreputation.Theeffectofwomenonthefacultyislikewisefairlysmall,only0.0013per1percentchange.Asmighthavebeenexpected,thesinglemostimportantinfluenceisthepresenceofawomen’sstudiesprogram,whichincreasesthelikelihoodofagendereconomicscoursebyafull0.1863.Ofcourse,therawdatawouldhaveanticipatedthisresult;ofthe31schoolsthatofferagendereconomicscourse,29also\nGreene-50240bookJune27,200222:39718CHAPTER21✦ModelsforDiscreteChoiceTABLE21.8MarginalEffectsinGenderEconomicsModelDirectIndirectTotal(Std.Error)(TypeofVariable,Mean)GenderEconomicsEquationAcRep−0.002022−0.001453−0.003476(0.00126)(Continuous,119.242)PctWecon+0.4491+0.4491(0.1568)(Continuous,0.24787)EconFac+0.01190+0.1190(0.01292)(Continuous,6.74242)Relig−0.07049−0.03227−0.1028(0.1055)(Binary,0.57576)WomStud+0.1863+0.1863(0.0868)(Endogenous,0.43939)PctWfac+0.13951+0.13951(0.08916)(Continuous,0.35772)Women’sStudiesEquationAcRep−0.00754−0.00754(0.002187)(Continuous,119.242)PctWfac+0.13789+0.13789(0.01002)(Continuous,0.35772)Relig−0.13265−0.13266(0.18803)(Binary,0.57576)haveawomen’sstudiesprogramandonlytwodonot.Notefinallythattheeffectofreligiousaffiliation(whateveritis)ismostlydirect.Beforeclosingthisapplication,wecanusethisopportunitytoexaminethefitmea-sureslistedinSection21.4.5.Wecomputedthevariousfitmeasuresusingsevendifferentspecificationsofthegendereconomicsequation:1.Single-equationprobitestimates,z1,z2,z3,z4,z5,y22.Bivariateprobitmodelestimates,z1,z2,z3,z4,z5,y23.Single-equationprobitestimates,z1,z2,z3,z4,z54.Single-equationprobitestimates,z1,z3,z5,y25.Single-equationprobitestimates,z1,z3,z56.Single-equationprobitestimates,z1,z57.Single-equationprobitestimatesz1(constantonly).Thespecificationsareindescending“quality”becauseweremovedthemoststatisticallysignificantvariablesfromthemodelateachstep.ThevaluesarelistedinTable21.9.Thematrixbeloweachcolumnisthetableof“hits”and“misses”ofthepredictionruleyˆ=1ifPˆ>0.5,0otherwise.[Notethatbyconstruction,model(7)mustpredictallonesorallzeros.]Thecolumnistheactualcountandtherowistheprediction.Thus,formodel(1),92of101zeroswerepredictedcorrectly,whereasfiveof31oneswerepredictedincorrectly.Asonewouldhope,thefitmeasuresdeclineasthemoresignificantTABLE21.9BinaryChoiceFitMeasuresMeasure(1)(2)(3)(4)(5)(6)(7)LRI0.5730.5350.4950.4070.2790.2060.000R20.8440.8440.8230.7970.7540.7180.641BLλ0.5650.5600.5260.4440.3190.2160.000R20.5610.5580.5300.4750.3430.2160.000EFR20.7080.7070.6720.5890.4470.3520.000VZR20.6870.6790.6280.5670.5450.3290.000MZ92993892994798310101010Predictions5265268238231615310310\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice719variablesareremovedfromthemodel.TheBen-Akivameasurehasanobviousflawinthatwithonlyaconstantterm,themodelstillobtainsa“fit”of0.641.Fromthepredictionmatrices,itisclearthattheexplanatorypowerofthemodel,suchasitis,comesfromitsabilitytopredicttheonescorrectly.Thepooreristhemodel,thegreaterthenumberofcorrectpredictionsofy=0.Butasthisnumberrises,thenumberofincorrectpredictionsrisesandthenumberofcorrectpredictionsofy=1declines.Allthefitmeasuresappeartoreacttothisfeaturetosomedegree.TheEfronandCramermeasures,whicharenearlyidentical,andMcFadden’sLRIappeartobemostsensitivetothis,withtheremainingtwoonlyslightlylessconsistent.21.7LOGITMODELSFORMULTIPLECHOICESSomestudiesofmultiple-choicesettingsincludethefollowing:1.Hensher(1986),McFadden(1974),andmanyothershaveanalyzedthetravelmodeofurbancommuters.2.SchmidtandStrauss(1975a,b)andBoskin(1974)haveanalyzedoccupationalchoiceamongmultiplealternatives.3.Terza(1985)hasstudiedtheassignmentofbondratingstocorporatebondsasachoiceamongmultiplealternatives.Thesearealldistinctfromthemultivariateprobitmodelweexaminedearlier.Inthatsetting,therewereseveraldecisions,eachbetweentwoalternatives.Herethereisasingledecisionamongtwoormorealternatives.Wewillexaminetwobroadtypesofchoicesets,orderedandunordered.Thechoiceamongmeansofgettingtowork—bycar,bus,train,orbicycle—isclearlyunordered.Abondratingis,bydesign,aranking;thatisitspurpose.Asweshallsee,quitedifferenttechniquesareusedforthetwotypesofmodels.Modelsforunorderedchoicesetsareconsideredinthissection.AmodelfororderedchoicesisdescribedinSection21.8.Unordered-choicemodelscanbemotivatedbyarandomutilitymodel.FortheithconsumerfacedwithJchoices,supposethattheutilityofchoicejisU=zβ+ε.ijijijIftheconsumermakeschoicejinparticular,thenweassumethatUijisthemaximumamongtheJutilities.Hence,thestatisticalmodelisdrivenbytheprobabilitythatchoicejismade,whichisProb(Uij>Uik)forallotherk=j.Themodelismadeoperationalbyaparticularchoiceofdistributionforthedisturbances.Asbefore,twomodelshavebeenconsidered,logitandprobit.Becauseoftheneedtoevaluatemultipleintegralsofthenormaldistribution,theprobitmodelhasfoundratherlimiteduseinthissetting.Thelogitmodel,incontrast,hasbeenwidelyusedinmanyfields,includingeconomics,marketresearch,andtransportationengineering.LetYibearandomvariablethatindicatesthechoicemade.McFadden(1973)hasshownthatif(andonlyif)theJdisturbancesareindependentandidenticallydistributedwith\nGreene-50240bookJune27,200222:39720CHAPTER21✦ModelsforDiscreteChoicetypeIextremevalue(Gumbel)distribution,F(ε)=exp(−e−εij),ijthenzβeijProb(Yi=j)=Jz,(21-44)eijβj=1whichleadstowhatiscalledtheconditionallogitmodel.56Utilitydependsonxij,whichincludesaspectsspecifictotheindividualaswellastothechoices.Itisusefultodistinguishthem.Letzij=[xij,wi].Thenxijvariesacrossthechoicesandpossiblyacrosstheindividualsaswell.Thecomponentsofxijaretypicallycalledtheattributesofthechoices.Butwicontainsthecharacteristicsoftheindividualandis,therefore,thesameforallchoices.Ifweincorporatethisfactinthemodel,then(21-44)becomesβx+αwβxαweijieijeiiProb(Yi=j)=Jβx+αw=Jβxαw.eijieijeiij=1j=1Termsthatdonotvaryacrossalternatives—thatis,thosespecifictotheindividual—falloutoftheprobability.Evidently,ifthemodelistoallowindividualspecificeffects,thenitmustbemodified.Onemethodistocreateasetofdummyvariablesforthechoicesandmultiplyeachofthembythecommonw.Wethenallowthecoefficienttovaryacrossthechoicesinsteadofthecharacteristics.Analogouslytothelinearmodel,acompletesetofinteractiontermscreatesasingularity,sooneofthemmustbedropped.Forexample,amodelofashoppingcenterchoicebyindividualsmightspecifythatthechoicedependsonattributesoftheshoppingcenterssuchasnumberofstoresanddistancefromthecentralbusinessdistrict,bothofwhicharethesameforallindividuals,andincome,whichvariesacrossindividuals.Supposethattherewerethreechoices.Thethreeregressorvectorswouldbeasfollows:Choice1:StoresDistanceIncome0Choice2:StoresDistance0IncomeChoice3:StoresDistance00Thedatasetstypicallyanalyzedbyeconomistsdonotcontainmixturesofindividual-andchoice-specificattributes.Suchdatawouldbefartoocostlytogatherformostpurposes.Whentheydo,theprecedingframeworkcanbeused.Forthepresent,itisusefultoexaminethetwotypesofdataseparatelyandconsideraspectsofthemodelthatarespecifictothetwotypesofapplications.21.7.1THEMULTINOMIALLOGITMODELTosetupthemodelthatapplieswhendataareindividualspecific,itwillhelptocon-sideranexample.SchmidtandStrauss(1975a,b)estimatedamodelofoccupational56Itisoccasionallylabeledthemultinomiallogitmodel,butthiswordingconflictswiththeusualnameforthemodeldiscussedinthenextsection,whichdiffersslightly.Althoughthedistinctionturnsouttobepurelyartificial,wewillmaintainitforthepresent.\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice721choicebasedonasampleof1000observationsdrawnfromthePublicUseSampleforthreeyears,1960,1967,and1970.Foreachsample,thedataforeachindividualinthesampleconsistofthefollowing:1.Occupation:0=menial,1=bluecollar,2=craft,3=whitecollar,4=professional.2.Regressors:constant,education,experience,race,sex.ThemodelforoccupationalchoiceisβxejiProb(Yi=j)=4βx,j=0,1,...,4.(21-45)ekik=0(ThebinomiallogitofSections21.3and21.4isconvenientlyproducedasthespecialcaseofJ=1.)Themodelin(21-45)isamultinomiallogitmodel.57Theestimatedequationspro-videasetofprobabilitiesfortheJ+1choicesforadecisionmakerwithcharacteristicsxi.Beforeproceeding,wemustremoveanindeterminacyinthemodel.Ifwedefine∗βj=βj+qforanyvectorq,thenrecomputingtheprobabilitiesdefinedbelowusing∗βjinsteadofβjproducestheidenticalsetofprobabilitiesbecauseallthetermsinvolv-ingqdropout.Aconvenientnormalizationthatsolvestheproblemisβ0=0.(Thisarisesbecausetheprobabilitiessumtoone,soonlyJparametervectorsareneededtodeterminetheJ+1probabilities.)Therefore,theprobabilitiesareβxejiProb(Yi=j|xi)=Jβxforj=0,2,...,J,β0=0.(21-46)1+ekik=1TheformofthebinomialmodelexaminedinSection21.4resultsifJ=1.ThemodelimpliesthatwecancomputeJlog-oddsratiosPijln=xi(βj−βk)=xiβjifk=0.PikFromthepointofviewofestimation,itisusefulthattheoddsratio,Pj/Pk,doesnotdependontheotherchoices,whichfollowsfromtheindependenceofdisturbancesintheoriginalmodel.Fromabehavioralviewpoint,thisfactisnotveryattractive.WeshallreturntothisprobleminSection21.7.3.Thelog-likelihoodcanbederivedbydefining,foreachindividual,dij=1ifalter-nativejischosenbyindividuali,and0ifnot,fortheJ−1possibleoutcomes.Then,foreachi,oneandonlyoneofthedij’sis1.Thelog-likelihoodisageneralizationofthatforthebinomialprobitorlogitmodel:nJlnL=dijlnProb(Yi=j).i=1j=0Thederivativeshavethecharacteristicallysimpleform∂lnL=(dij−Pij)xiforj=1,...,J.∂βji57NerloveandPress(1973).\nGreene-50240bookJune27,200222:39722CHAPTER21✦ModelsforDiscreteChoiceTheexactsecondderivativesmatrixhasJ2K×Kblocks,∂2lnLn=−P[1(j=l)−P]xx,58ijilii∂βj∂βli=1where1(j=l)equals1ifjequalsland0ifnot.SincetheHessiandoesnotinvolvedij,thesearetheexpectedvalues,andNewton’smethodisequivalenttothemethodofscoring.Itisworthnotingthatthenumberofparametersinthismodelproliferateswiththenumberofchoices,whichisunfortunatebecausethetypicalcrosssectionsometimesinvolvesafairlylargenumberofregressors.Thecoefficientsinthismodelaredifficulttointerpret.Itistemptingtoassociateβjwiththejthoutcome,butthatwouldbemisleading.Bydifferentiating(21-46),wefindthatthemarginaleffectsofthecharacteristicsontheprobabilitiesare∂PJjδj==Pjβj−Pkβk=Pj[βj−β¯].(21-47)∂xik=0Therefore,everysubvectorofβenterseverymarginaleffect,boththroughtheprob-abilitiesandthroughtheweightedaveragethatappearsinδj.Thesevaluescanbecomputedfromtheparameterestimates.Althoughtheusualfocusisonthecoefficientestimates,equation(21-47)suggeststhatthereisatleastsomepotentialforconfusion.Note,forexample,thatforanyparticularxk,∂Pj/∂xkneednothavethesamesignasβjk.Standarderrorscanbeestimatedusingthedeltamethod.(SeeSection5.2.4.)Forpurposesofthecomputation,letβ=[0,β,β,...,β].Weincludethefixed0vector12jforoutcome0becausealthoughβ0=0,γ0=−P0β¯,whichisnot0.NoteaswellthatAsy.Cov[βˆ0,βˆj]=0forj=0,...,J.ThenJJ∂δj∂δjAsy.Var[δˆj]=Asy.Cov[βˆl,βˆm],∂βl∂βml=0m=0∂δj=[1(j=l)−Pl][PjI+δjx]+Pj[δlx].∂βlFindingadequatefitmeasuresinthissettingpresentsthesamedifficultiesasinthebinomialmodels.Asbefore,itisusefultoreportthelog-likelihood.Ifthemodelcontainsnocovariatesandnoconstantterm,thenthelog-likelihoodwillbeJ1lnLc=njln.J+1j=0wherenjisthenumberofindividualswhochooseoutcomej.Iftheregressorvectorincludesonlyaconstantterm,thentherestrictedlog-likelihoodisJnJjlnL0=njln=njlnpj,nj=0j=058Ifthedatawereintheformofproportions,suchasmarketshares,thentheappropriatelog-likelihoodandderivativesarenipijandni(pij−Pij)xi,respectively.ThetermsintheHessianareijijmultipliedbyni.\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice723wherepjisthesampleproportionofobservationsthatmakechoicej.Ifdesired,thelikelihoodratioindexcanalsobereported.Ausefultablewillgivealistingofhitsandmissesofthepredictionrule“predictYi=jifPˆjisthemaximumofthepredictedprobabilities.”5921.7.2THECONDITIONALLOGITMODELWhenthedataconsistofchoice-specificattributesinsteadofindividual-specificchar-acteristics,theappropriatemodelisβzeijProb(Yi=j|zi1,zi2,...,ziJ)=Jβz.(21-48)eijj=1Here,inaccordancewiththeconventionintheliterature,weletj=1,2,...,JforatotalofJalternatives.Themodelisotherwiseessentiallythesameasthemultinomiallogit.Evenmorecarewillberequiredininterpretingtheparameters,however.Onceagain,anexamplewillhelptofocusideas.Inthismodel,thecoefficientsarenotdirectlytiedtothemarginaleffects.Themarginaleffectsforcontinuousvariablescanbeobtainedbydifferentiating(21-48)withrespecttoxtoobtain∂Pj=[Pj(1(j=k)−Pk)]β,k=1,...,J.∂xk(Toavoidclutteringthenotation,wehavedroppedtheobservationsubscript.)ItisclearthatthroughitspresenceinPjandPk,everyattributesetxjaffectsalltheprobabilities.Henshersuggeststhatonemightprefertoreportelasticitiesoftheprobabilities.TheeffectofattributemofchoicekonPjwouldbe∂logPj=xkm[1(j=k)−Pk]βm.∂logxkmSincethereisnoambiguityaboutthescaleoftheprobabilityitself,whetheroneshouldreportthederivativesortheelasticitiesislargelyamatteroftaste.SomeofHensher’selasticityestimatesaregiveninTable21.16lateroninthischapter.EstimationoftheconditionallogitmodelissimplestbyNewton’smethodorthemethodofscoring.Thelog-likelihoodisthesameasforthemultinomiallogitmodel.Onceagain,wedefinedij=1ifYi=jand0otherwise.ThennJlogL=dijlogProb(Yi=j).i=1j=1Marketshareandfrequencydataarecommoninthissetting.Ifthedataareinthisform,thentheonlychangeneededis,onceagain,todefinedijastheproportionorfrequency.59Unfortunately,itiscommonforthisruletopredictallobservationwiththesamevalueinanunbalancedsampleoramodelwithlittleexplanatorypower.\nGreene-50240bookJune27,200222:39724CHAPTER21✦ModelsforDiscreteChoiceBecauseofthesimpleformofL,thegradientandHessianhaveparticularlyconvenientJforms:Letx¯i=j=1Pijxij.Then,∂logLnJ=dij(xij−x¯i),∂βi=1j=1∂2logLnJ=−P(x−x¯)(x−x¯),ijijiiji∂β∂βi=1j=1Theusualproblemsoffitmeasuresappearhere.Thelog-likelihoodratioandtabula-tionofactualversuspredictedchoiceswillbeuseful.Therearetwopossibleconstrainedlog-likelihoods.Sincethemodelcannotcontainaconstantterm,theconstraintβ=0rendersallprobabilitiesequalto1/J.Theconstrainedlog-likelihoodforthisconstraintisthenLc=−nlnJ.Ofcourse,itisunlikelythatthishypothesiswouldfailtobere-jected.Alternatively,wecouldfitthemodelwithonlytheJ−1choice-specificconstants,whichmakestheconstrainedlog-likelihoodthesameasinthemultinomiallogitmodel,lnL∗=nlnpwhere,asbefore,nisthenumberofindividualswhochoose0jjjjalternativej.21.7.3THEINDEPENDENCEFROMIRRELEVANTALTERNATIVESWenotedearlierthattheoddsratiosinthemultinomiallogitorconditionallogitmod-elsareindependentoftheotheralternatives.Thispropertyisconvenientasregardsestimation,butitisnotaparticularlyappealingrestrictiontoplaceonconsumerbe-havior.ThepropertyofthelogitmodelwherebyPj/Pkisindependentoftheremainingprobabilitiesiscalledtheindependencefromirrelevantalternatives(IIA).Theindependenceassumptionfollowsfromtheinitialassumptionthatthedistur-bancesareindependentandhomoscedastic.Laterwewilldiscussseveralmodelsthathavebeendevelopedtorelaxthisassumption.Beforedoingso,weconsideratestthathasbeendevelopedfortestingthevalidityoftheassumption.HausmanandMcFadden(1984)suggestthatifasubsetofthechoicesettrulyisirrelevant,omittingitfromthemodelaltogetherwillnotchangeparameterestimatessystematically.Exclusionofthesechoiceswillbeinefficientbutwillnotleadtoinconsistency.Butiftheremainingoddsratiosarenottrulyindependentfromthesealternatives,thentheparameterestimatesobtainedwhenthesechoicesareincludedwillbeinconsistent.ThisobservationistheusualbasisforHausman’sspecificationtest.Thestatisticisχ2=(βˆ−βˆ)[Vˆ−Vˆ]−1(βˆ−βˆ),sfsfsfwheresindicatestheestimatorsbasedontherestrictedsubset,findicatestheestimatorbasedonthefullsetofchoices,andVˆsandVˆfaretherespectiveestimatesoftheasymptoticcovariancematrices.Thestatistichasalimitingchi-squareddistributionwithKdegreesoffreedom.6060McFadden(1987)showshowthishypothesiscanalsobetestedusingaLagrangemultipliertest.\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice72521.7.4NESTEDLOGITMODELSIftheindependencefromirrelevantalternativestestfails,thenanalternativetothemultinomiallogitmodelwillbeneeded.Anaturalalternativeisamultivariateprobitmodel:Uj=βxj+εj,j=1,...,J,[ε1,ε2,...,εJ]∼N[0,].Wehadconsideredthismodelearlierbutfoundthatasageneralmodelofconsumerchoice,itsfailingswerethepracticaldifficultyofcomputingthemultinormalintegralandestimationofanunrestrictedcorrelationmatrix.HausmanandWise(1978)pointoutthatforamodelofconsumerchoice,theprobitmodelmaynotbeasimpracticalasitmightseem.First,forJchoices,thecomparisonsimplicitinUj>Ukfork=jinvolvetheJ−1differences,εj−εk.Thus,startingwithaJ-dimensionalproblem,weneedonlyconsiderderivativesof(J−1)-orderprobabilities.Therefore,tocometoaconcreteexample,amodelwithfourchoicesrequiresonlytheevaluationofbi-variatenormalintegrals,which,albeitstillcomplicatedtoestimate,iswellwithinthereceivedtechnology.Forlargermodels,however,otherspecificationshaveprovedmoreuseful.OnewaytorelaxthehomoscedasticityassumptionintheconditionallogitmodelthatalsoprovidesanintuitivelyappealingstructureistogroupthealternativesintosubgroupsthatallowthevariancetodifferacrossthegroupswhilemaintainingtheIIAassumptionwithinthegroups.Thisspecificationdefinesanestedlogitmodel.Tofixideas,itisusefultothinkofthisspecificationasatwo-(ormore)levelchoiceproblem(although,onceagain,themodelarisesasamodificationofthestochasticspecificationintheoriginalconditionallogitmodel,notasamodelofbehavior).Suppose,then,thattheJalternativescanbedividedintoLsubgroupssuchthatthechoicesetcanbewritten[c1,...,cJ]=(c1|1,...,cJ1|1),...,(c1|L,...,cJL|L).Logically,wemaythinkofthechoiceprocessasthatofchoosingamongtheLchoicesetsandthenmakingthespecificchoicewithinthechosenset.Thismethodproducesatreestructure,whichfortwobranchesand,say,fivechoicesmightlookasfollows:ChoiceBranch1Branch2c1|1c2|1c1|2c2|2c3|2Supposeaswellthatthedataconsistofobservationsontheattributesofthechoicesxj|landattributesofthechoicesetszl.Toderivethemathematicalformofthemodel,webeginwiththeunconditionalprobabilityxβ+zγej|llProb[twigj,branchl]=Pjl=LJlxβ+zγ.ej|lll=1j=1\nGreene-50240bookJune27,200222:39726CHAPTER21✦ModelsforDiscreteChoiceNowwritethisprobabilityasJxβLxβzγlej|lezlγej|lelj=1l=1Pjl=Pj|lPl=JlxβLzγLJxβ+zγ.ej|ll=1ellej|llj=1l=1j=1DefinetheinclusivevalueforthelthbranchasJlxβIl=lnej|l.j=1Then,aftercancelingtermsandusingthisresult,wefindxβzγ+τIej|lelllPj|l=JlxβandPl=Lzγ+τlIl,ej|lelj=1l=1wherethenewparametersτlmustequal1toproducetheoriginalmodel.Therefore,weusetherestrictionτl=1torecovertheconditionallogitmodel,andtheprecedingequationjustwritesthismodelinanotherform.Thenestedlogitmodelarisesifthisrestrictionisrelaxed.Theinclusivevaluecoefficients,unrestrictedinthisfashion,allowthemodeltoincorporatesomedegreeofheteroscedasticity.Withineachbranch,theIIArestrictioncontinuestohold.Theequalvarianceofthedisturbanceswithinthejthbrancharenowπ2σ2=.61j6τjWithτj=1,thisrevertstothebasicresultforthemultinomiallogitmodel.Asusual,thecoefficientsinthemodelarenotdirectlyinterpretable.Thederivativesthatdescribecovariationoftheattributesandprobabilitiesare∂lnProb[choicec,branchb]=1(b=B)[1(c=C)−PC|B]∂x(k)inchoiceCandbranchB+τB[1(b=B)−PB]PC|Bβk.Thenestedlogitmodelhasbeenextendedtothreeandhigherlevels.Thecomplexityofthemodelincreasesgeometricallywiththenumberoflevels.Butthemodelhasbeenfoundtobeextremelyflexibleandiswidelyusedformodelingconsumerchoiceandinthemarketingandtransportationliteratures,tonameafew.Therearetwowaystoestimatetheparametersofthenestedlogitmodel.Alimitedinformation,two-stepmaximumlikelihoodapproachcanbedoneasfollows:1.Estimateβbytreatingthechoicewithinbranchesasasimpleconditionallogitmodel.2.Computetheinclusivevaluesforallthebranchesinthemodel.EstimateγandtheτparametersbytreatingthechoiceamongbranchesasaconditionallogitmodelwithattributeszlandIl.61SeeHensher,Louviere,andSwaite(2000).\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice727Sincethisapproachisatwo-stepestimator,theestimateoftheasymptoticcovariancematrixoftheestimatesatthesecondstepmustbecorrected.[SeeSection4.6,McFadden(1984),andGreene(1995a,Chapter25).]Forfullinformationmaximumlikelihood(FIML)estimationofthemodel,thelog-likelihoodisnlnL=ln[Prob(twig|branch)]×Prob(branch)]i.i=1Theinformationmatrixisnotblockdiagonalinβand(γ,τ),soFIMLestimationwillbemoreefficientthantwo-stepestimation.Tospecifythenestedlogitmodel,itisnecessarytopartitionthechoicesetintobranches.Sometimestherewillbeanaturalpartition,suchasintheexamplegivenbyMaddala(1983)whenthechoiceofresidenceismadefirstbycommunity,thenbydwellingtypewithinthecommunity.Inotherinstances,however,thepartitioningofthechoicesetisadhocandleadstothetroublingpossibilitythattheresultsmightbedepen-dentonthebranchessodefined.(Manystudiesinthisliteraturepresentseveralsetsofresultsbasedondifferentspecificationsofthetreestructure.)Thereisnowell-definedtestingprocedurefordiscriminatingamongtreestructures,whichisaproblematicas-pectofthemodel.21.7.5AHETEROSCEDASTICLOGITMODELBhat(1995)andAllenbyandGinter(1995)havedevelopedanextensionofthecon-ditionallogitmodelthatworksaroundthedifficultyofspecifyingthetreeforanestedmodel.Theirmodelisbasedonthesamerandomutilitystructureasbefore,Uij=βxij+εij.Thelogitmodelarisesfromtheassumptionthatεijhasahomoscedasticextremevalue(HEV)distributionwithcommonvarianceπ2/6.Theauthors’proposedmodelsimplyrelaxestheassumptionofequalvariances.Sincethecomparisonsareallpairwise,oneofthevariancesissetto1.0;thesamecomparisonsofutilitieswillresultifallequationsaremultipliedbythesameconstant,sotheindeterminacyisremovedbysettingoneofthevariancestoone.Themodelthatremains,then,isexactlyasbefore,withtheadditionalassumptionthatVar[εij]=σj,withσJ=1.0.21.7.6MULTINOMIALMODELSBASEDONTHENORMALDISTRIBUTIONAnaturalalternativemodelthatrelaxestheindependencerestrictionsbuiltintothemultinomiallogit(MNL)modelisthemultinomialprobit(MNP)model.ThestructuralequationsoftheMNPmodelareU=xβ+ε,j=1,...,J,[ε,ε,...,ε]∼N[0,].jjjj12JTheterminthelog-likelihoodthatcorrespondstothechoiceofalternativeqisProb[choiceq]=Prob[Uq>Uj,j=1,...,J,j=q].TheprobabilityforthisoccurrenceisProb[choiceq]=Prob[ε−ε>(x−x)β,...,ε−ε>(x−x)β]1qq1JqqJ\nGreene-50240bookJune27,200222:39728CHAPTER21✦ModelsforDiscreteChoicefortheJ−1otherchoices,whichisacumulativeprobabilityfroma(J−1)-variatenormaldistribution.AsintheHEVmodel,sinceweareonlymakingcomparisons,oneofthevariancesinthisJ−1variatestructure—thatis,oneofthediagonalelementsinthereduced—mustbenormalizedto1.0.Sinceonlycomparisonsareeverobservableinthismodel,foridentification,J−1ofthecovariancesmustalsobenormalized,tozero.TheMNPmodelallowsanunrestricted(J−1)×(J−1)correlationstructureandJ−2freestandarddeviationsforthedisturbancesinthemodel.(Thus,atwochoicemodelreturnstotheunivariateprobitmodelofSection21.2.)Formorethantwochoices,thisspecificationisfarmoregeneralthantheMNLmodel,whichassumesthat=I.(ThescalingisabsorbedinthecoefficientvectorintheMNLmodel.)ThemainobstacletoimplementationoftheMNPmodelhasbeenthedifficultyincomputingthemultivariatenormalprobabilitiesforanydimensionalityhigherthan2.Recentresultsonaccuratesimulationofmultinormalintegrals,however,havemadeestimationoftheMNPmodelfeasible.(SeeSectionE.5.6andasymposiumintheNovember1994issueoftheReviewofEconomicsandStatistics.)Yetsomepracticalproblemsremain.Computationisexceedinglytimeconsuming.Itisalsonecessarytoensurethatremainapositivedefinitematrix.OnewayoftensuggestedistoconstructtheCholeskydecompositionof,LL,whereLisalowertriangularmatrix,andesti-matetheelementsofL.Maintainingthenormalizationsandzerorestrictionswillstillbecumbersome,however.Analternativeisestimatethecorrelations,R,andadiagonalmatrixofstandarddeviations,S=diag(σ1,...,σJ−2,1,1)separately.Thenormaliza-tions,Rjj=1,andexclusions,RJl=0,aresimpletoimpose,andisjustSRS.Risotherwiserestrictedonlyinthat−10]=(1−F)×1−e−λi.Lambert(1992)andGreene(1994)consideranumberofalternativeformulations,includinglogitandprobitmodelsdiscussedinSections21.3and21.4,fortheprobabilityofthetworegimes.BothofthesemodificationssubstantiallyalterthePoissonformulation.First,notethattheequalityofthemeanandvarianceofthedistributionnolongerfollows;bothmodificationsinduceoverdispersion.Ontheotherhand,theoverdispersiondoesnotarisefromheterogeneity;itarisesfromthenatureoftheprocessgeneratingthezeros.Assuch,aninterestingidentificationproblemarisesinthismodel.Ifthedatadoappeartobecharacterizedbyoverdispersion,thenitseemslessthanobviouswhetheritshouldbeattributedtoheterogeneityortotheregimesplittingmechanism.Mullahy(1986)arguesthepointmorestrongly.Hedemonstratesthatoverdispersionwillalwaysinduceexcesszeros.Assuch,inasplittingmodel,wearelikelytomisinterprettheexcesszerosasduetothesplittingprocessinsteadoftheheterogeneity.75Themodelisvariouslylabeledthe“WithZeros,”orWZ,model[Mullahy(1986)],the“ZeroInflatedPoisson,”orZIP,model[Lambert(1992)],and“Zero-AlteredPoisson,”orZAP,model[Greene(1994)].\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice751Itmightbeofinteresttotestsimplywhetherthereisaregimesplittingmechanismatworkornot.Unfortunately,thebasicmodelandthezero-inflatedmodelarenotnested.Settingtheparametersofthesplittingmodeltozero,forexample,doesnotproduceProb[z=0]=0.Intheprobitcase,thisprobabilitybecomes0.5,whichmaintainstheregimesplit.Theprecedingtestsforover-orunderdispersionwouldberatherindirect.Whatisdesiredisatestofnon-Poissonness.Analternativedistributionmay(butneednot)produceasystematicallydifferentproportionofzerosthanthePoisson.Testingforadifferentdistribution,asopposedtoadifferentsetofparameters,isadifficultprocedure.Sincethehypothesesarenecessarilynonnested,thepowerofanytestisafunctionofthealternativehypothesisandmay,undersome,besmall.Vuong(1989)hasproposedateststatisticfornonnestedmodelsthatiswellsuitedforthissettingwhenthealternativedistributioncanbespecified.Letfj(yi|xi)denotethepredictedprobabilitythattherandomvariableYequalsyiundertheassumptionthatthedistributionisfj(yi|xi),forj=1,2,andletf1(yi|xi)mi=log.f2(yi|xi)ThenVuong’sstatisticfortestingthenonnestedhypothesisofModel1versusModel2is√1nnni=1miv=.1n2ni=1(mi−m¯)ThisisthestandardstatisticfortestingthehypothesisthatE[mi]equalszero.Vuongshowsthatvhasalimitingstandardnormaldistribution.Ashenotes,thestatisticisbidirectional.If|v|islessthantwo,thenthetestdoesnotfavoronemodelortheother.Otherwise,largevaluesfavorModel1whereassmall(negative)valuesfavorModel2.Carryingoutthetestrequiresestimationofbothmodelsandcomputationofbothsetsofpredictedprobabilities.InGreene(1994),itisshownthattheVuongtesthassomepowertodiscernthisphenomenon.Thelogicofthetestingprocedureistoallowforoverdispersionbyspec-ifyinganegativebinomialcountdataprocess,thenexaminewhether,evenallowingfortheoverdispersion,therestillappeartobeexcesszeros.Inhisapplication,thatappearstobethecase.Example21.12ASplitPopulationModelforMajorDerogatoryReportsGreene(1995c)estimatedamodelofconsumerbehaviorinwhichthedependentvariableofinterestwasthenumberofmajorderogatoryreportsrecordedinthecredithistoryforasampleofapplicantsforatypeofcreditcard.Thebasicmodelpredictsyi,thenumberofmajorderogatorycreditreports,asafunctionofxi=[1,age,income,averageexpenditure].ThedataforthemodelappearinAppendixTableF21.4.Thereare1,319observationsinthesample(10%oftheoriginaldataset.)Inspectionofthedatarevealsapreponderanceofzeros.Indeed,of1,319observations,1060haveyi=0,whereasoftheremaining259,137have1,50have2,24have3,17have4,and11have5—theremaining20rangefrom6to14.Thus,foraPoissondistribution,thesedataareactuallyabitextreme.WeproposetouseLambert’szeroinflatedPoissonmodelinstead,withthePoissondistributionbuiltaroundlnλi=β1+β2age+β3income+β4expenditure.Forthesplittingmodel,weusealogitmodel,withcovariatesz=[1,age,income,own/rent].TheestimatesareshowninTable21.21.Vuong’sdiagnosticstatisticappearstoconfirm\nGreene-50240bookJune27,200222:39752CHAPTER21✦ModelsforDiscreteChoiceTABLE21.21EstimatesofaSplitPopulationModelPoissonandLogitModelsSplitPopulationModelVariablePoissonforyLogitfory>0PoissonforyLogitfory>0Constant−0.8196−2.24421.00102.1540(0.1453)(0.2515)(0.1267)(0.2900)Age0.0071810.02245−0.005073−0.02469(0.003978)(0.007313)(0.003218)(0.008451)Income0.077900.069310.01332−0.1167(0.02394)(0.04198)(0.02249)(0.04941)Expend−0.004102−0.002359(0.0003740)(0.0001948)Own/Rent−0.37660.3865(0.1578)(0.1709)LogL−1396.719−645.5649−1093.0280nPˆ(0|xˆ)938.61061.5intuitionthatthePoissonmodeldoesnotadequatelydescribethedata;thevalueis6.9788.Usingthemodelparameterstocomputeapredictionofthenumberofzeros,itisclearthatthesplittingmodeldoesperformbetterthanthebasicPoissonregression.21.10SUMMARYANDCONCLUSIONSThischapterhassurveyedtechniquesformodelingdiscretechoice.Weexaminedfourclassesofmodels:binarychoice,orderedchoice,multinomialchoice,andmodelsforcounts.Thefirstthreeofthesearequitefarremovedfromtheregressionmodels(lin-earandnonlinear)thathavebeenthefocusofthepreceding20chapters.Themostimportantdifferenceconcernsthemodelingapproach.Uptothispoint,wehavebeenprimarilyinterestedinmodelingtheconditionalmeanfunctionforoutcomesthatvarycontinuously.Inthischapter,wehaveshiftedourapproachtooneofmodelingtheconditionalprobabilitiesofevents.Modelingbinarychoice—thedecisionbetweentwoalternatives—isagrowthareaintheappliedeconometricsliterature.Maximumlikelihoodestimationoffullyparame-terizedmodelsremainsthemainstayoftheliterature.But,wealsoconsideredsemipara-metricandnonparametricformsofthemodelandexaminedmodelsfortimeseriesandpaneldata.Theorderedchoicemodelisanaturalextensionofthebinarychoicesettingandalsoaconvenientbridgebetweenmodelsofchoicebetweentwoalternativesandmorecomplexmodelsofchoiceamongmultiplealternatives.Multinomialchoicemod-elingislikewisealargefield,bothwithineconomicsand,especially,inmanyotherfields,suchasmarketing,transportation,politicalscience,andsoon.Themultinomiallogitmodelandmanyvariationsofitprovideanespeciallyrichframeworkwithinwhichmodelershavecarefullymatchedbehavioralmodelingtoempiricalspecificationandestimation.Finally,modelsofcountdataareclosertoregressionmodelsthantheotherthreefields.ThePoissonregressionmodelisessentiallyanonlinearregression,but,asintheothercases,itismorefruitfultodothemodelingintermsoftheprobabilitiesofdiscretechoiceratherthanasaformofregressionanalysis.\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice753KeyTermsandConcepts•Attributes•Kerneldensityestimator•Overdispersion•Binarychoicemodel•Kernelfunction•Persistence•Bivariateprobit•Lagrangemultipliertest•Poissonmodel•Bootstrapping•Latentregression•Probit•ButlerandMoffittmethod•Likelihoodequations•Proportionsdata•Choicebasedsampling•Likelihoodratiotest•Quadrature•Chowtest•LimitedinformationML•Qualitativechoice•Conditionallikelihood•Linearprobabilitymodel•Qualitativeresponsefunction•Logit•Quasi-MLE•Conditionallogit•Marginaleffects•Randomcoefficients•Countdata•Maximumlikelihood•Randomeffectsmodel•Fixedeffectsmodel•Maximumscoreestimator•Randomparametersmodel•FullinformationML•Maximumsimulated•Randomutilitymodel•Generalizedresiduallikelihood•Ranking•Goodnessoffitmeasure•Mean-squareddeviation•Recursivemodel•Groupeddata•Minimalsufficientstatistic•Robustcovariance•Heterogeneity•Minimumchi-squaredestimation•Heteroscedasticityestimator•Sampleselection•Incidentalparameters•Multinomiallogit•Scoringmethodproblem•Multinomialprobit•Semiparametricestimation•Inclusivevalue•Multivariateprobit•Statedependence•Independencefrom•Negativebinomialmodel•Unbalancedsampleirrelevantalternatives•Nestedlogit•Unordered•Indexfunctionmodel•Nonnestedmodels•Weibullmodel•Individualdata•Normit•Initialconditions•OrderedchoicemodelExercises1.Abinomialprobabilitymodelistobebasedonthefollowingindexfunctionmodel:y∗=α+βd+ε,y=1,ify∗>0,y=0otherwise.Theonlyregressor,d,isadummyvariable.Thedataconsistof100observationsthathavethefollowing:y0102428d13216Obtainthemaximumlikelihoodestimatorsofαandβ,andestimatetheasymptoticstandarderrorsofyourestimates.TestthehypothesisthatβequalszerobyusingaWaldtest(asymptoticttest)andalikelihoodratiotest.Usetheprobitmodelandthenrepeat,usingthelogitmodel.Doyourresultschange?[Hint:Formulatethelog-likelihoodintermsofαandδ=α+β.]\nGreene-50240bookJune27,200222:39754CHAPTER21✦ModelsforDiscreteChoice2.Supposethatalinearprobabilitymodelistobefittoasetofobservationsonadependentvariableythattakesvalueszeroandone,andasingleregressorxthatvariescontinuouslyacrossobservations.Obtaintheexactexpressionsfortheleastsquaresslopeintheregressionintermsofthemean(s)andvarianceofx,andinterprettheresult.3.Giventhedatasety1001100111,x9254673526estimateaprobitmodelandtestthehypothesisthatxisnotinfluentialindetermin-ingtheprobabilitythatyequalsone.4.ConstructtheLagrangemultiplierstatisticfortestingthehypothesisthatalltheslopes(butnottheconstantterm)equalzerointhebinomiallogitmodel.ProvethattheLagrangemultiplierstatisticisnR2intheregressionof(y=p)onthexs,iwherePisthesampleproportionof1s.5.Weareinterestedintheorderedprobitmodel.Ourdataconsistof250observations,ofwhichtheresponsearey01234.n5040458035Usingtheprecedingdata,obtainmaximumlikelihoodestimatesoftheunknownpa-rametersofthemodel.[Hint:Considertheprobabilitiesastheunknownparameters.]6.Thefollowinghypotheticaldatagivetheparticipationratesinaparticulartypeofrecyclingprogramandthenumberoftruckspurchasedforcollectionby10townsinasmallmid-Atlanticstate:Town12345678910Trucks160250170365210206203305270340Participation%1174887628348847179ThetownofEleveniscontemplatinginitiatingarecyclingprogrambutwishestoachievea95percentrateofparticipation.Usingaprobitmodelforyouranalysis,a.Howmanytruckswouldthetownexpecttohavetopurchaseinordertoachievetheirgoal?[Hint:SeeSection21.4.6.]Notethatyouwilluseni=1.b.Iftruckscost$20,000each,thenisagoalof90percentreachablewithinabudgetof$6.5million?(Thatis,shouldtheyexpecttoreachthegoal?)c.Accordingtoyourmodel,whatisthemarginalvalueofthe301sttruckintermsoftheincreaseinthepercentageparticipation?7.Adatasetconsistsofn=n1+n2+n3observationsonyandx.Forthefirstn1observations,y=1andx=1.Forthenextn2observations,y=0andx=1.Forthelastn3observations,y=0andx=0.Provethatneither(21-19)nor(21-21)hasasolution.\nGreene-50240bookJune27,200222:39CHAPTER21✦ModelsforDiscreteChoice7558.Dataont=strikedurationandx=unanticipatedindustrialproductionforanumberofstrikesineachof9yearsaregiveninAppendixTableF22.1.UsethePoissonregressionmodeldiscussedinSection21.9todeterminewhetherxisasignificantdeterminantofthenumberofstrikesinagivenyear.9.Asymptotics.Explorewhetheraveragingindividualmarginaleffectsgivesthesameanswerascomputingthemarginaleffectatthemean.10.Prove(21-28).11.InthepaneldatamodelsestimatedinExample21.5.1,neitherthelogitnortheprobitmodelprovidesaframeworkforapplyingaHausmantesttodeterminewhetherfixedorrandomeffectsispreferred.Explain.(Hint:Unlikeourapplicationinthelinearmodel,theincidentalparametersproblempersistshere.)\nGreene-50240bookJune28,200217:522LIMITEDDEPENDENTVARIABLEANDDURATIONMODELSQ22.1INTRODUCTIONThischapterisconcernedwithtruncationandcensoring.1Theeffectoftruncationoccurswhensampledataaredrawnfromasubsetofalargerpopulationofinterest.Forexample,studiesofincomebasedonincomesaboveorbelowsomepovertylinemaybeoflimitedusefulnessforinferenceaboutthewholepopulation.Truncationisessentiallyacharacteristicofthedistributionfromwhichthesampledataaredrawn.Censoringisamorecommonprobleminrecentstudies.Tocontinuetheexample,supposethatinsteadofbeingunobserved,allincomesbelowthepovertylinearereportedasiftheywereatthepovertyline.Thecensoringofarangeofvaluesofthevariableofinterestintroducesadistortionintoconventionalstatisticalresultsthatissimilartothatoftruncation.Unliketruncation,however,censoringisessentiallyadefectinthesampledata.Presumably,iftheywerenotcensored,thedatawouldbearepresentativesamplefromthepopulationofinterest.Thischapterwilldiscussfourbroadtopics:truncation,censoring,aformoftrunca-tioncalledthesampleselectionproblem,andaclassofmodelscalleddurationmodels.Althoughmostempiricalworkonthefirstthreeinvolvescensoringratherthantrun-cation,wewillstudythesimplermodeloftruncationfirst.Itprovidesmostofthetheoreticaltoolsweneedtoanalyzemodelsofcensoringandsampleselection.Thefourthtopic,onmodelsofduration—Whenwillaspellofunemploymentorastrikeend?—couldreasonablystandalone.Itdoesincountlessarticlesandalibraryofbooks.2Weincludeourintroductiontothissubjectinthischapterbecauseinmostapplications,durationmodelinginvolvescensoreddataanditisthusconvenienttotreatdurationhere(andbecausewearenearingtheendofoursurveyandyetanotherchapterseemsunwarranted).22.2TRUNCATIONInthissection,weareconcernedwithinferringthecharacteristicsofafullpopulationfromasampledrawnfromarestrictedpartofthatpopulation.1FiveofthemanysurveysofthesetopicsareDhrymes(1984),Maddala(1977b,1983,1984),andAmemiya(1984).Thelastispartofasymposiumoncensoredandtruncatedregressionmodels.AsurveythatisorientedtowardapplicationsandtechniquesisLong(1997).Somerecentresultsonnon-andsemiparametricestimationappearinLee(1996).2Forexample,Lancaster(1990)andKiefer(1985).756\nGreene-50240bookJune28,200217:5CHAPTER22✦LimitedDependentVariableandDurationModels75722.2.1TRUNCATEDDISTRIBUTIONSAtruncateddistributionisthepartofanuntruncateddistributionthatisaboveorbelowsomespecifiedvalue.Forinstance,inExample22.2,wearegivenacharacteristicofthedistributionofincomesabove$100,000.Thissubsetisapartofthefulldistributionofincomeswhichrangefromzeroto(essentially)infinity.THEOREM22.1DensityofaTruncatedRandomVariableIfacontinuousrandomvariablexhaspdff(x)andaisaconstant,thenf(x)f(x|x>a)=.3Prob(x>a)Theprooffollowsfromthedefinitionofconditionalprobabilityandamountsmerelytoscalingthedensitysothatitintegratestooneovertherangeabovea.Notethatthetruncateddistributionisaconditionaldistribution.Mostrecentapplicationsbasedoncontinuousrandomvariablesusethetruncatednormaldistribution.Ifxhasanormaldistributionwithmeanµandstandarddeviationσ,thena−µProb(x>a)=1−=1−(α),σwhereα=(a−µ)/σand(.)isthestandardnormalcdf.Thedensityofthetruncatednormaldistributionisthen1x−µf(x)(2πσ2)−1/2e−(x−µ)2/(2σ2)φσσf(x|x>a)===,1−(α)1−(α)1−(α)whereφ(.)isthestandardnormalpdf.Thetruncatedstandardnormaldistribution,withµ=0andσ=1,isillustratedfora=−0.5,0,and0.5inFigure22.1.Anothertruncateddistributionwhichhasappearedintherecentliterature,thisoneforadiscreterandomvariable,isthetruncatedatzeroPoissondistribution,(e−λλy)/y!(e−λλy)/y!Prob[Y=y|y>0]==Prob[Y>0]1−Prob[Y=0](e−λλy)/y!=,λ>0,y=1,...1−e−λThisdistributionisusedinmodelsofusesofrecreationandotherkindsoffacilitieswhereobservationsofzerousesarediscarded.4Forconvenienceinwhatfollows,weshallcallarandomvariablewhosedistributionistruncatedatruncatedrandomvariable.3Thecaseoftruncationfromaboveinsteadofbelowishandledinananalogousfashionanddoesnotrequireanynewresults.4SeeShaw(1988).\nGreene-50240bookJune28,200217:5758CHAPTER22✦LimitedDependentVariableandDurationModels1.2TruncationpointMeanofdistribution1.00.80.6Density0.40.203210.500.5123xFIGURE22.1TruncatedNormalDistributions.22.2.2MOMENTSOFTRUNCATEDDISTRIBUTIONSWeareusuallyinterestedinthemeanandvarianceofthetruncatedrandomvariable.Theywouldbeobtainedbythegeneralformula:∞E[x|x>a]=xf(x|x>a)dxaforthemeanandlikewiseforthevariance.Example22.1TruncatedUniformDistributionIfxhasastandarduniformdistribution,denotedU(0,1),thenf(x)=1,0≤x≤1.1Thetruncatedatx=distributionisalsouniform;31f(x)131fx|x>===,≤x≤1.3Probx>122333Theexpectedvalueis1132Ex|x>=xdx=.3231/3ForavariabledistributeduniformlybetweenLandU,thevarianceis(U−L)2/12.Thus,11Varx|x>=.32711Themeanandvarianceoftheuntruncateddistributionareand,respectively.212\nGreene-50240bookJune28,200217:5CHAPTER22✦LimitedDependentVariableandDurationModels759Example22.1illustratestworesults.1.Ifthetruncationisfrombelow,thenthemeanofthetruncatedvariableisgreaterthanthemeanoftheoriginalone.Ifthetruncationisfromabove,thenthemeanofthetruncatedvariableissmallerthanthemeanoftheoriginalone.ThisisclearlyvisibleinFigure22.1.2.Truncationreducesthevariancecomparedwiththevarianceintheuntruncateddistribution.Henceforth,weshallusethetermstruncatedmeanandtruncatedvariancetorefertothemeanandvarianceoftherandomvariablewithatruncateddistribution.Forthetruncatednormaldistribution,wehavethefollowingtheorem:5THEOREM22.2MomentsoftheTruncatedNormalDistributionIfx∼N[µ,σ2]andaisaconstant,thenE[x|truncation]=µ+σλ(α),(22-1)Var[x|truncation]=σ2[1−δ(α)],(22-2)whereα=(a−µ)/σ,φ(α)isthestandardnormaldensityandλ(α)=φ(α)/[1−(α)]iftruncationisx>a,(22-3a)λ(α)=−φ(α)/(α)iftruncationisx4.605]=4.956.ItalsotoldusthatProb[y>4.605]=0.02.FromTheorem22.2,σφ(α)E[y|y>4.605]=µ+,1−(α)whereα=(4.605−µ)/σ.Wealsoknowthat(α)=0.98,soα=−1(0.98)=2.054.Weinfer,then,that(a)2.054=(4.605−µ)/σ.Inaddition,givenα=2.054,φ(α)=φ(2.054)=0.0484.From(22-1),then,4.956=µ+σ(0.0484/0.02)or(b)4.956=µ+2.420σ.Thesolutionsto(a)and(b)areµ=2.635andσ=0.959.Toobtainthemeanincome,wenowusetheresultthatify∼N[µ,σ2]andx=ey,thenE[x]=E[ey]=eµ+σ2/2.InsertingourvaluesforµandσgivesE[x]=$22,087.The1987StatisticalAbstractoftheUnitedStateslistedaveragehouseholdincomeacrossallgroupsfortheUnitedStatesasabout$25,000.Sotheestimate,basedonsurprisinglylittleinformation,wouldhavebeenrelativelygood.ThesemeagerdatadidindeedtellussomethingabouttheaverageAmerican.22.2.3THETRUNCATEDREGRESSIONMODELInthemodeloftheearlierexamples,wenowassumethatµ=xβiiisthedeterministicpartoftheclassicalregressionmodel.Theny=xβ+ε,iiiwhereε|x∼N[0,σ2],iisothaty|x∼N[xβ,σ2].(22-5)iiiWeareinterestedinthedistributionofyigiventhatyiisgreaterthanthetruncationpointa.ThisistheresultdescribedinTheorem22.2.Itfollowsthatφ[(a−xβ)/σ]E[y|y>a]=xβ+σi.(22-6)iii1−[(a−xiβ)/σ]Theconditionalmeanisthereforeanonlinearfunctionofa,σ,xandβ.ThemarginaleffectsinthismodelinthesubpopulationcanbeobtainedbywritingE[y|y>a]=xβ+σλ(α),(22-7)iiiiwherenowα=(a−xβ)/σ.Forconvenience,letλ=λ(α)andδ=δ(α).Theniiiiii∂E[yi|yi>a]∂αi=β+σ(dλi/dαi)∂xi∂xi=β+σλ2−αλ(−β/σ)iii(22-8)=β1−λ2+αλiii=β(1−δi).Notetheappearanceofthetruncatedvariance.Sincethetruncatedvarianceisbetweenzeroandone,weconcludethatforeveryelementofxi,themarginaleffectislessthan\nGreene-50240bookJune28,200217:5CHAPTER22✦LimitedDependentVariableandDurationModels761thecorrespondingcoefficient.Thereisasimilarattenuationofthevariance.Inthesubpopulationy>a,theregressionvarianceisnotσ2butiVar[y|y>a]=σ2(1−δ).(22-9)iiiWhetherthemarginaleffectin(22-7)orthecoefficientβitselfisofinterestdependsontheintendedinferencesofthestudy.Iftheanalysisistobeconfinedtothesubpopulation,then(22-7)isofinterest.Ifthestudyisintendedtoextendtotheentirepopulation,however,thenitisthecoefficientsβthatareactuallyofinterest.One’sfirstinclinationmightbetouseordinaryleastsquarestoestimatetheparam-etersofthisregressionmodel.Forthesubpopulationfromwhichthedataaredrawn,wecouldwrite(22-6)intheformy|y>a=E[y|y>a]+u=xβ+σλ+u,(22-10)iiiiiiiiwhereuiisyiminusitsconditionalexpectation.Byconstruction,uihasazeromean,butitisheteroscedastic:Var[u]=σ21−λ2+λα=σ2(1−δ),iiiiiwhichisafunctionofxi.Ifweestimate(22-10)byordinaryleastsquaresregressionofyonX,thenwehaveomittedavariable,thenonlineartermλi.Allthebiasesthatarisebecauseofanomittedvariablecanbeexpected.7Withoutsomeknowledgeofthedistributionofx,itisnotpossibletodeterminehowseriousthebiasislikelytobe.AresultobtainedbyCheungandGoldberger(1984)isbroadlysuggestive.IfE[x|y]inthefullpopulationisalinearfunctionofy,thenplimb=βτforsomeproportionalityconstantτ.Thisresultisconsistentwiththewidelyobserved(albeitratherrough)proportionalityrelationshipbetweenleastsquaresestimatesofthismodelandconsistentmaximumlikelihoodestimates.8Theproportionalityresultappearstobequitegeneral.Inapplications,itisusuallyfoundthat,comparedwithconsistentmaximumlikelihoodestimates,theOLSestimatesarebiasedtowardzero.(SeeExample22.4.)22.3CENSOREDDATAAverycommonprobleminmicroeconomicdataiscensoringofthedependentvariable.Whenthedependentvariableiscensored,valuesinacertainrangearealltransformedto(orreportedas)asinglevalue.Someexamplesthathaveappearedintheempiricalliteratureareasfollows:91.Householdpurchasesofdurablegoods[Tobin(1958)],2.Thenumberofextramaritalaffairs[Fair(1977,1978)],3.Thenumberofhoursworkedbyawomaninthelaborforce[QuesterandGreene(1982)],4.Thenumberofarrestsafterreleasefromprison[Witte(1980)],7SeeHeckman(1979)whoformulatesthisasa“specificationerror.”8SeetheappendixinHausmanandWise(1977)andGreene(1983)aswell.9MoreextensivelistingsmaybefoundinAmemiya(1984)andMaddala(1983).\nGreene-50240bookJune28,200217:5762CHAPTER22✦LimitedDependentVariableandDurationModels5.Householdexpenditureonvariouscommoditygroups[Jarque(1987)],6.Vacationexpenditures[MelenbergandvanSoest(1996)].Eachofthesestudiesanalyzesadependentvariablethatiszeroforasignificantfractionoftheobservations.Conventionalregressionmethodsfailtoaccountforthequalitativedifferencebetweenlimit(zero)observationsandnonlimit(continuous)observations.22.3.1THECENSOREDNORMALDISTRIBUTIONTherelevantdistributiontheoryforacensoredvariableissimilartothatforatruncatedone.Onceagain,webeginwiththenormaldistribution,asmuchofthereceivedworkhasbeenbasedonanassumptionofnormality.Wealsoassumethatthecensoringpointiszero,althoughthisisonlyaconvenientnormalization.Inatruncateddistribution,onlythepartofdistributionabovey=0isrelevanttoourcomputations.Tomakethedistributionintegratetoone,wescaleitupbytheprobabilitythatanobservationintheuntruncatedpopulationfallsintherangethatinterestsus.Whendataarecensored,thedistributionthatappliestothesampledataisamixtureofdiscreteandcontinuousdistributions.Figure22.2illustratestheeffects.Toanalyzethisdistribution,wedefineanewrandomvariableytransformedfromtheoriginalone,y∗,byy=0ify∗≤0,y=y∗ify∗>0.Thedistributionthatappliesify∗∼N[µ,σ2]isProb(y=0)=Prob(y∗≤0)=(−µ/σ)=1−(µ/σ),andify∗>0,thenyhasthedensityofy∗.Thisdistributionisamixtureofdiscreteandcontinuousparts.Thetotalprobabilityisone,asrequired,butinsteadofscalingthesecondpart,wesimplyassignthefullprobabilityinthecensoredregiontothecensoringpoint,inthiscase,zero.FIGURE22.2PartiallyCensoredDistribution.CapacitySeatsdemandedCapacityTicketssold\nGreene-50240bookJune28,200217:5CHAPTER22✦LimitedDependentVariableandDurationModels763THEOREM22.3MomentsoftheCensoredNormalVariableIfy∗∼N[µ,σ2]andy=aify∗≤aorelsey=y∗,thenE[y]=a+(1−)(µ+σλ)andVar[y]=σ2(1−)[(1−δ)+(α−λ)2],where[(a−µ)/σ]=(α)=Prob(y∗≤a)=,λ=φ/(1−)andδ=λ2−λα.Proof:Forthemean,E[y]=Prob(y=a)×E[y|y=a]+Prob(y>a)×E[y|y>a]=Prob(y∗≤a)×a+Prob(y∗>a)×E[y∗|y∗>a]=a+(1−)(µ+σλ)usingTheorem22.2.Forthevariance,weuseacounterparttothedecompositionin(B-70),thatis,Var[y]=E[conditionalvariance]+Var[conditionalmean],andTheorem22.2.Forthespecialcaseofa=0,themeansimplifiestoφ(µ/σ)E[y|a=0]=(µ/σ)(µ+σλ),whereλ=.(µ/σ)Forcensoringoftheupperpartofthedistributioninsteadofthelower,itisonlyneces-sarytoreversetheroleofand1−andredefineλasinTheorem22.2.Example22.3CensoredRandomVariableWeareinterestedinthenumberofticketsdemandedforeventsatacertainarena.Ouronlymeasureisthenumberactuallysold.Wheneveraneventsellsout,however,weknowthattheactualnumberdemandedislargerthanthenumbersold.Thenumberofticketsdemandediscensoredwhenitistransformedtoobtainthenumbersold.Supposethatthearenainquestionhas20,000seatsand,inarecentseason,soldout25percentofthetime.Iftheaverageattendance,includingsellouts,was18,000,thenwhatarethemeanandstandarddeviationofthedemandforseats?AccordingtoTheorem22.3,the18,000isanestimateofE[sales]=20,000(1−)+[µ+σλ].Sincethisiscensoringfromabove,ratherthanbelow,λ=−φ(α)/(α).Theargumentof,φ,andλisα=(20,000−µ)/σ.If25percentoftheeventsaresellouts,then=0.75.Invertingthestandardnormalat0.75givesα=0.675.Inaddition,ifα=0.675,then−φ(0.675)/0.75=λ=−0.424.Thisresultprovidestwoequationsinµandσ,(a)18,000=0.25(20,000)+0.75(µ−0.424σ)and(b)0.675σ=20,000−µ.Thesolutionsareσ=2426andµ=18,362.\nGreene-50240bookJune28,200217:5764CHAPTER22✦LimitedDependentVariableandDurationModelsForcomparison,supposethatweweretoldthatthemeanof18,000appliesonlytotheeventsthatwerenotsoldoutandthat,onaverage,thearenasellsout25percentofthetime.Nowourestimateswouldbeobtainedfromtheequations(a)18,000=µ−0.424σand(b)0.675σ=20,000−µ.Thesolutionsareσ=1820andµ=18,772.22.3.2THECENSOREDREGRESSION(TOBIT)MODELTheregressionmodelbasedontheprecedingdiscussionisreferredtoasthecensoredregressionmodelorthetobitmodel.[InreferencetoTobin(1958),wherethemodelwasfirstproposed.]Theregressionisobtainedbymakingthemeanintheprecedingcorrespondtoaclassicalregressionmodel.Thegeneralformulationisusuallygivenintermsofanindexfunction,y∗=xβ+ε,iiiy=0ify∗≤0,(22-11)iiy=y∗ify∗>0.iiiTherearepotentiallythreeconditionalmeanfunctionstoconsider,dependingonthepurposeofthestudy.Fortheindexvariable,sometimescalledthelatentvariable,E[y∗|x]isxβ.Ifthedataarealwayscensored,however,thenthisresultwillusu-iiiallynotbeuseful.ConsistentwithTheorem22.3,foranobservationrandomlydrawnfromthepopulation,whichmayormaynotbecensored,xβE[y|x]=i(xβ+σλ),iiiiσwhereφ[(0−xβ)/σ]φ(xβ/σ)iiλi==.(22-12)1−[(0−xiβ)/σ](xiβ/σ)Finally,ifweintendtoconfineourattentiontouncensoredobservations,thentheresultsforthetruncatedregressionmodelapply.Thelimitobservationsshouldnotbediscarded,however,becausethetruncatedregressionmodelisnomoreamenabletoleastsquaresthanthecensoreddatamodel.Itisanunresolvedquestionwhichofthesefunctionsshouldbeusedforcomputingpredictedvaluesfromthismodel.Intu-itionsuggeststhatE[yi|xi]iscorrect,butauthorsdifferonthispoint.ForthesettinginExample22.3,forpredictingthenumberofticketssold,say,toplanforanupcomingevent,thecensoredmeanisobviouslytherelevantquantity.Ontheotherhand,iftheobjectiveistostudytheneedforanewfacility,thenthemeanofthelatentvariabley∗iwouldbemoreinteresting.Therearedifferencesinthemarginaleffectsaswell.Fortheindexvariable,∂E[y∗|x]ii=β.∂xiButthisresultisnotwhatwillusuallybeofinterest,sincey∗isunobserved.Fortheiobserveddata,y,thefollowinggeneralresultwillbeuseful:10i10SeeGreene(1999)forthegeneralresultandRosettandNelson(1975)andNakamuraandNakamura(1983)forapplicationsbasedonthenormaldistribution.\nGreene-50240bookJune28,200217:5CHAPTER22✦LimitedDependentVariableandDurationModels765THEOREM22.4MarginalEffectsintheCensoredRegressionModelInthecensoredregressionmodelwithlatentregressiony∗=xβ+εandobserveddependentvariable,y=aify∗≤a,y=bify∗≥b,andy=y∗otherwise,whereaandbareconstants,letf(ε)andF(ε)denotethedensityandcdfofε.Assumethatεisacontinuousrandomvariablewithmean0andvarianceσ2,andf(ε|x)=f(ε).Then∂E[y|x]=β×Prob[a0]∂Prob[yi>0]=Prob[yi>0]+E[yi|xi,yi>0].∂xi∂xi∂xiThus,achangeinxhastwoeffects:Itaffectstheconditionalmeanofy∗inthepositiveiipartofthedistribution,anditaffectstheprobabilitythattheobservationwillfallinthatpartofthedistribution.Example22.4EstimatedTobitEquationsforHoursWorkedIntheirstudyofthenumberofhoursworkedinasurveyyearbyalargesampleofwives,QuesterandGreene(1982)wereinterestedinwhetherwiveswhosemarriageswerestatisti-callymorelikelytodissolvehedgedagainstthatpossibilitybyspending,onaverage,moretimeworking.TheyreportedthetobitestimatesgiveninTable22.1.Thelastfigureinthetableimpliesthataverylargeproportionofthewomenreportedzerohours,soleastsquaresregressionwouldbeinappropriate.Thefiguresinparenthesesaretheratioofthecoefficientestimatetotheestimatedasymp-toticstandarderror.Thedependentvariableishoursworkedinthesurveyyear.“Smallkids”isadummyvariableindicatingwhethertherewerechildreninthehousehold.The“educationdifference”and“relativewage”variablescomparehusbandandwifeonthesetwodimen-sions.Thewagerateusedforwiveswaspredictedusingapreviouslyestimatedregressionmodelandisthusavailableforallindividuals,whetherworkingornot.“Secondmarriage”isadummyvariable.Divorceprobabilitieswereproducedbyalargemicrosimulationmodelpre-sentedinanotherstudy[Orcutt,Caldwell,andWertheimer(1976)].Thevariablesusedhereweredummyvariablesindicating“mean”ifthepredictedprobabilitywasbetween0.01and0.03and“high”ifitwasgreaterthan0.03.The“slopes”arethemarginaleffectsdescribedearlier.Notethemarginaleffectscomparedwiththetobitcoefficients.Likewise,theestimateofσisquitemisleadingasanestimateofthestandarddeviationofhoursworked.Theeffectsofthedivorceprobabilityvariableswereasexpectedandwerequitelarge.Oneofthequestionsraisedinconnectionwiththisstudywaswhetherthedivorceprobabilitiescouldreasonablybetreatedasindependentvariables.Itmightbethatfortheseindividuals,thenumberofhoursworkedwasasignificantdeterminantoftheprobability.22.3.3ESTIMATIONEstimationofthismodelisverysimilartothatoftruncatedregression.Thetobitmodelhasbecomesoroutineandbeenincorporatedinsomanycomputerpackagesthatdespiteformidableobstaclesinyearspast,estimationisnowessentiallyonthelevelof\nGreene-50240bookJune28,200217:5CHAPTER22✦LimitedDependentVariableandDurationModels767TABLE22.1TobitEstimatesofanHoursWorkedEquationWhiteWivesBlackWivesLeastScaledCoefficientSlopeCoefficientSlopeSquaresOLSConstant−1803.13−2753.87(−8.64)(−9.68)Smallkids−1324.84−385.89−824.19−376.53−352.63−766.56(−19.78)(−10.14)Education−48.08−14.0022.5910.3211.4724.93difference(−4.77)(1.96)Relativewage312.0790.90286.39130.93123.95269.46(5.71)(3.32)Secondmarriage175.8551.5125.3311.5713.1428.57(3.47)(0.41)Meandivorce417.39121.58481.02219.75219.22476.57probability(6.52)(5.28)Highdivorce670.22195.22578.66264.36244.17530.80probability(8.40)(5.33)σ15596181511826Samplesize74592798Proportionworking0.290.46ordinarylinearregression.11Thelog-likelihoodforthecensoredregressionmodelis1(y−xβ)2xβ2iiilnL=−log(2π)+lnσ++ln1−.(22-13)2σ2σyi>0yi=0Thetwopartscorrespondtotheclassicalregressionforthenonlimitobservationsandtherelevantprobabilitiesforthelimitobservations,respectively.Thislikelihoodisanonstandardtype,sinceitisamixtureofdiscreteandcontinuousdistributions.Inaseminalpaper,Amemiya(1973)showedthatdespitethecomplications,proceedingintheusualfashiontomaximizelogLwouldproduceanestimatorwithallthefamiliardesirablepropertiesattainedbyMLEs.Thelog-likelihoodfunctionisfairlyinvolved,butOlsen’s(1978)reparameterizationsimplifiesthingsconsiderably.Withγ=β/σandθ=1/σ,thelog-likelihoodis1lnL=−[ln(2π)−lnθ2+(θy−xγ)2]+ln[1−(xγ)].(22-14)iii2yi>0yi=0Theresultsinthissettingarenowverysimilartothoseforthetruncatedregres-sion.TheHessianisalwaysnegativedefinite,soNewton’smethodissimpletouseandusuallyconvergesquickly.Afterconvergence,theoriginalparameterscanbere-coveredusingσ=1/θandβ=γ/θ.Theasymptoticcovariancematrixfortheseesti-matescanbeobtainedfromthatfortheestimatesof[γ,θ]usingEst.Asy.Var[βˆ,σˆ]=JˆAsy.Var[γˆ,θˆ]Jˆ,where∂β/∂γ∂β/∂θ(1/θ)I(−1/θ2)γJ==.∂σ/∂γ∂σ/∂θ0(−1/θ2)11SeeHall(1984).\nGreene-50240bookJune28,200217:5768CHAPTER22✦LimitedDependentVariableandDurationModelsResearchersoftencomputeordinaryleastsquaresestimatesdespitetheirincon-sistency.Almostwithoutexception,itisfoundthattheOLSestimatesaresmallerinabsolutevaluethantheMLEs.AstrikingempiricalregularityisthatthemaximumlikelihoodestimatescanoftenbeapproximatedbydividingtheOLSestimatesbytheproportionofnonlimitobservationsinthesample.12TheeffectisillustratedinthelasttwocolumnsofTable22.1.Anotherstrategyistodiscardthelimitobservations,butwenowseethatjusttradesthecensoringproblemforthetruncationproblem.22.3.4SOMEISSUESINSPECIFICATIONTwoissuesthatcommonlyariseinmicroeconomicdata,heteroscedasticityandnonnor-mality,havebeenanalyzedatlengthinthetobitsetting.1322.3.4.aHeteroscedasticityMaddalaandNelson(1975),Hurd(1979),ArabmazarandSchmidt(1982a,b),andBrownandMoffitt(1982)allhavevaryingdegreesofpessimismregardinghowin-consistentthemaximumlikelihoodestimatorwillbewhenheteroscedasticityoccurs.Notsurprisingly,thedegreeofcensoringistheprimarydeterminant.Unfortunately,alltheanalyseshavebeencarriedoutinthesettingofveryspecificmodels—forexample,involvingonlyasingledummyvariableoronewithgroupwiseheteroscedasticity—sotheprimarylessonistheverygeneralconclusionthatheteroscedasticityemergesasanobviouslyseriousproblem.Onecanapproachtheheteroscedasticityproblemdirectly.PetersenandWaldman(1981)presentthecomputationsneededtoestimateatobitmodelwithheteroscedastic-ityofseveraltypes.Replacingσwithσinthelog-likelihoodfunctionandincludingσ2iiinthesummationsproducestheneededgenerality.Specificationofaparticularmodelforσiprovidestheempiricalmodelforestimation.Example22.5MultiplicativeHeteroscedasticityintheTobitModelPetersenandWaldman(1981)analyzedthevolumeofshortinterestinacrosssectionofcom-monstocks.Theregressorsincludedameasureofthemarketcomponentofheterogeneousexpectationsasmeasuredbythefirm’sBETAcoefficient;acompany-specificmeasureofheterogeneousexpectations,NONMARKET;theNUMBERofanalystsmakingearningsfore-castsforthecompany;thenumberofcommonsharestobeissuedfortheacquisitionofanotherfirm,MERGER;andadummyvariablefortheexistenceofOPTIONs.TheyreporttheresultslistedinTable22.2foramodelinwhichthevarianceisassumedtobeoftheformσ2=exp(xα).Thevaluesinparenthesesaretheratioofthecoefficienttotheestimatediiasymptoticstandarderror.Theeffectofheteroscedasticityontheestimatesisextremelylarge.Wedonote,however,acommonmisconceptionintheliterature.Thechangeinthecoefficientsisoftenmisleading.Themarginaleffectsintheheteroscedasticitymodelwillgenerallybeverysimilartothosecomputedfromthemodelwhichassumeshomoscedasticity.(Thecalculationispursuedintheexercises.)Atestofthehypothesisthatα=0(exceptfortheconstantterm)canbebasedonthelikelihoodratiostatistic.Fortheseresults,thestatisticis−2[−547.3−(−466.27)]=162.06.Thisstatistichasalimitingchi-squareddistributionwithfivedegreesoffreedom.Thesamplevalueexceedsthecriticalvalueinthetableof11.07,sothehypothesiscanberejected.12ThisconceptisexploredfurtherinGreene(1980b),Goldberger(1981),andCheungandGoldberger(1984).13TwosymposiathatcontainnumerousresultsonthesesubjectsareBlundell(1987)andDuncan(1986b).AnapplicationthatexploresthesetwoissuesindetailisMelenbergandvanSoest(1996).\nGreene-50240bookJune28,200217:5CHAPTER22✦LimitedDependentVariableandDurationModels769TABLE22.2EstimatesofaTobitModel(Standarderrorsinparentheses)HomoscedasticHeteroscedasticββαConstant−18.28(5.10)−4.11(3.28)−0.47(0.60)Beta10.97(3.61)2.22(2.00)1.20(1.81)Nonmarket0.65(7.41)0.12(1.90)0.08(7.55)Number0.75(5.74)0.33(4.50)0.15(4.58)Merger0.50(5.90)0.24(3.00)0.06(4.17)Option2.56(1.51)2.96(2.99)0.83(1.70)LogL−547.30−466.27Samplesize200200Intheprecedingexample,wecarriedoutalikelihoodratiotestagainstthehypoth-esisofhomoscedasticity.Itwouldbedesirabletobeabletocarryoutthetestwithouthavingtoestimatetheunrestrictedmodel.ALagrangemultipliertestcanbeusedforthatpurpose.Considertheheteroscedastictobitmodelinwhichwespecifythat22αwσ=σei.(22-15)iThismodelisafairlygeneralspecificationthatincludesmanyfamiliaronesasspecialcases.Thenullhypothesisofhomoscedasticityisα=0.(WeusedthisspecificationintheprobitmodelinSection19.4.1.bandinthelinearregressionmodelinSection17.7.1.)UsingtheBHHHestimatoroftheHessianasusual,wecanproduceaLagrangemultiplierstatisticasfollows:Letzi=1ifyiispositiveand0otherwise,εi(−1)λiai=zi+(1−zi),σ2σε2/σ2−1(xβ)λiiibi=zi+(1−zi),(22-16)2σ22σ3φ(xβ/σ)iλi=.1−i(xiβ/σ)Thedatavectorisg=[ax,b,bw].Thesumsaretakenoverallobservations,andiiiiiiallfunctionsinvolvingunknownparameters(ε,φ,xβ,λ,etc.)areevaluatedattheiirestricted(homoscedastic)maximumlikelihoodestimates.Then,LM=iG[GG]−1Gi=nR2(22-17)intheregressionofacolumnofonesontheK+1+Pderivativesofthelog-likelihoodfunctionforthemodelwithmultiplicativeheteroscedasticity,evaluatedattheestimatesfromtherestrictedmodel.(Iftherewerenolimitobservations,thenitwouldreducetotheBreusch–PaganstatisticdiscussedinSection11.4.3.)Giventhemaximumlikelihoodestimatesofthetobitmodelcoefficients,itisquitesimpletocompute.Thestatistichasalimitingchi-squareddistributionwithdegreesoffreedomequaltothenumberofvariablesinwi.\nGreene-50240bookJune28,200217:5770CHAPTER22✦LimitedDependentVariableandDurationModels22.3.4.bMisspecificationofProb[y*<0]Inanearlystudyinthisliterature,Cragg(1971)proposedasomewhatmoregeneralmodelinwhichtheprobabilityofalimitobservationisindependentoftheregressionmodelforthenonlimitdata.Onecanimagine,forinstance,thedecisiononwhetherornottopurchaseacarasbeingdifferentfromthedecisiononhowmuchtospendonthecar,havingdecidedtobuyone.ArelatedproblemraisedbyLinandSchmidt(1984)isthatinthetobitmodel,avariablethatincreasestheprobabilityofanobservationbeinganonlimitobservationalsoincreasesthemeanofthevariable.Theyciteasanexamplelossduetofireinbuildings.Olderbuildingsmightbemorelikelytohavefires,sothat∂Prob[yi>0]/∂agei>0,but,becauseofthegreatervalueofnewerbuildings,olderonesincursmallerlosseswhentheydohavefires,sothat∂E[yi|yi>0]/∂agei<0.Thisfactwouldrequirethecoefficientonagetohavedifferentsignsinthetwofunctions,whichisimpossibleinthetobitmodelbecausetheyarethesamecoefficient.Amoregeneralmodelthataccommodatestheseobjectionsisasfollows:1.Decisionequation:Prob[y∗>0]=(xγ),z=1ify∗>0,iiii(22-18)Prob[y∗≤0]=1−(xγ),z=0ify∗≤0.iiii2.Regressionequationfornonlimitobservations:E[y|z=1]=xβ+σλ,iiiiaccordingtoTheorem22.2.ThismodelisacombinationofthetruncatedregressionmodelofSection22.2andtheunivariateprobitmodelofSection21.3,whichsuggestsamethodofanalyzingit.Thetobitmodelofthissectionarisesifγequalsβ/σ.Theparametersoftheregres-sionequationcanbeestimatedindependentlyusingthetruncatedregressionmodelofSection22.2.ArecentapplicationisMelenbergandvanSoest(1996).FinandSchmidt(1984)consideredtestingtherestrictionofthetobitmodel.Basedonlyonthetobitmodel,theydevisedaLagrangemultiplierstatisticthat,althoughabitcumbersomealgebraically,canbecomputedwithoutgreatdifficulty.Ifoneisabletoestimatethetruncatedregressionmodel,thetobitmodel,andtheprobitmodelseparately,thenthereisasimplerwaytotestthehypothesis.Thetobitlog-likelihoodisthesumofthelog-likelihoodsforthetruncatedregressionandprobitmodels.[Toshowthisresult,addandsubtractln(xβ/σ)in(22-13).Thisproducesthelog-yi=1ilikelihoodforthetruncatedregressionmodelplus(21-20)fortheprobitmodel.14]Therefore,alikelihoodratiostatisticcanbecomputedusingλ=−2[lnLT−(lnLP+lnLTR)],whereLT=likelihoodforthetobitmodelin(22-13),withthesamecoefficients,LP=likelihoodfortheprobitmodelin(19-20),fitseparately,LTR=likelihoodforthetruncatedregressionmodel,fitseparately.14Thelikelihoodfunctionforthetruncatedregressionmodelisconsideredintheexercises.\nGreene-50240bookJune28,200217:5CHAPTER22✦LimitedDependentVariableandDurationModels77122.3.4.cNonnormalityNonnormalityisanespeciallydifficultprobleminthissetting.Ithasbeenshownthatiftheunderlyingdisturbancesarenotnormallydistributed,thentheestimatorbasedon(22-13)isinconsistent.Researchisongoingbothonalternativeestimatorsandonmethodsfortestingforthistypeofmisspecification.15Oneapproachtotheestimationistouseanalternativedistribution.KalbfleischandPrentice(1980)presentaunifyingtreatmentthatincludesseveraldistributionssuchastheexponential,lognormal,andWeibull.(Theirprimaryfocusisonsurvivalanalysisinamedicalstatisticssetting,whichisaninterestingconvergenceofthetechniquesinverydifferentdisciplines.)Ofcourse,assumingsomeotherspecificdistributiondoesnotnecessarilysolvetheproblemandmaymakeitworse.Apreferablealternativewouldbetodeviseanestimatorthatisrobusttochangesinthedistribution.Powell’s(1981,1984)leastabsolutedeviations(LAD)estimatorappearstooffersomepromise.16Themaindrawbacktoitsuseisitscomputationalcomplexity.AnextensiveapplicationoftheLADestimatorisMelenbergandvanSoest(1996).Althoughestimationinthenonnor-malcaseisrelativelydifficult,testingforthisfailureofthemodelisworthwhiletoassesstheestimatesobtainedbytheconventionalmethods.AmongtheteststhathavebeendevelopedareHausmantests,Lagrangemultipliertests[BeraandJarque(1981,1982),Bera,JarqueandLee(1982)],andconditionalmomenttests[Nelson(1981)].Thecon-ditionalmomenttestsaredescribedinthenextsection.ToemployaHausmantest,werequireanestimatorthatisconsistentandefficientunderthenullhypothesisbutinconsistentunderthealternative—thetobitestimatorwithnormality—andanestimatorthatisconsistentunderbothhypothesesbutineffi-cientunderthenullhypothesis.Thus,wewillrequirearobustestimatorofβ,whichrestoresthedifficultiesofthepreviousparagraph.Recentapplications[e.g.,MelenbergandvanSoest(1996)]haveusedtheHausmantesttocomparethetobit/normalestima-torwithPowell’sconsistent,butinefficient(robust),LADestimator.AnotherapproachtotestingistoembedthenormaldistributioninsomeotherdistributionandthenuseanLMtestforthenormalspecification.ChesherandIrish(1987)havedevisedanLMtestofnormalityinthetobitmodelbasedongeneralizedresiduals.Inmanymodels,includingthetobitmodel,thegeneralizedresidualscanbecomputedasthederivativesofthelog-densitieswithrespecttotheconstantterm,so1e=[z(y−xβ)−(1−z)σλ],iσ2iiiiiwhereziisdefinedin(22-18)andλiisdefinedin(22-16).Thisresidualisanestimateofεithataccountsforthecensoringinthedistribution.Byconstruction,E[ei|xi]=0,nandifthemodelactuallydoescontainaconstantterm,theni=1ei=0;thisisthefirstofthenecessaryconditionsfortheMLE.Thetestisthencarriedoutbyregressingacolumnof1sond=[ex,b,e3,e4−3e4],wherebisdefinedin(22-16).NotethattheiiiiiiiifirstK+1variablesindiarethederivativesofthetobitlog-likelihood.LetDbethen×(K+3)matrixwithithrowequaltod.ThenD=[G,M],wheretheK+1columnsi15SeeDuncan(1983,1986b),Goldberger(1983),PaganandVella(1989),Lee(1996),andFernandez(1986).Wewillexamineoneofthetestsmorecloselyinthefollowingsection.16SeeDuncan(1986a,b)forasymposiumonthesubjectandAmemiya(1984).AdditionalreferencesareNewey,Powell,andWalker(1990);Lee(1996);andRobinson(1988).\nGreene-50240bookJune28,200217:5772CHAPTER22✦LimitedDependentVariableandDurationModelsofGarethederivativesofthetobitlog-likelihoodandthetwocolumnsinMarethelasttwovariablesina.Thenthechi-squaredstatisticisnR2;thatis,iLM=iD(DD)−1Di.ThenecessaryconditionsthatdefinetheMLEareiG=0,sothefirstK+1elementsofiDarezero.Using(B-66),then,theLMstatisticbecomesLM=iM[MM−MG(GG)−1GM]−1Mi,whichisachi-squaredstatisticwithtwodegreesoffreedom.Notethesimilarityto(22-17),whereatestforhomoscedasticityiscarriedoutbythesamemethod.Asemergessoofteninthisframework,thetestofthedistributionactuallyfocusesontheskewnessandkurtosisoftheresiduals.22.3.4.dConditionalMomentTestsPaganandVella(1989)[see,aswell,Ruud(1984)]describeasetofconditionalmomenttestsofthespecificationofthetobitmodel.17Wewillconsiderthree:1.Thevariableszhavenotbeenerroneouslyomittedfromthemodel.2.Thedisturbancesinthemodelarehomoscedastic.3.Theunderlyingdisturbancesinthemodelarenormallydistributed.Forthethirdofthese,wewilltakethestandardapproachofexaminingthethirdandfourthmoments,whichforthenormaldistributionare0and3σ4,respectively.Theunderlyingmotivationforthetestscanbemadewithreferencetotheregressionpartofthetobitmodelin(22-11),y∗=xβ+ε.iiiNeglectingforthemomentthatweonlyobservey∗subjecttothecensoring,thethreeihypothesesimplythefollowingexpectations:1.E[z(y−xβ)]=0,iii2.Ez[(y−xβ)2−σ2]=0,iii3.E[(y−xβ)3]=0andE[(y−xβ)4−3σ4]=0.iiiiIn(1),thevariablesinziwouldbeoneormorevariablesnotalreadyinthemodel.Weareinterestedinassessingwhetherornottheyshouldbe.In(2),presumably,althoughnotnecessarily,ziwouldbetheregressorsinthemodel.Forthepresent,wewillassumethaty∗isobserveddirectly,withoutcensoring.Thatis,wewillconstructtheCMtestsifortheclassicallinearregressionmodel.Thenwewillgobacktothenecessarystepandmakethemodificationneededtoaccountforthecensoringofthedependentvariable.17Theirsurveyisquitegeneralandincludesothermodels,specifications,andestimationmethods.Wewillconsideronlythesimplestcaseshere.Thereaderisreferredtotheirpaperforformalpresentationoftheseresults.Developingspecificationtestsforthetobitmodelhasbeenapopularenterprise.AsamplingofthereceivedliteratureincludesNelson(1981);Bera,Jarque,andLee(1982);ChesherandIrish(1987);Chesher,Lancaster,andIrish(1985);Gourierouxetal.(1984,1987);Newey(1986);RiversandVuong(1988);HorowitzandNeumann(1989);andPaganandVella(1989).Newey(1985a,b)areusefulreferencesonthegeneralsubjectofconditionalmomenttesting.MoregeneraltreatmentsofspecificationtestingareGodfrey(1988)andRuud(1984).\nGreene-50240bookJune28,200217:5CHAPTER22✦LimitedDependentVariableandDurationModels773ConditionalmomenttestsaredescribedinSection17.6.4.Toreview,foramodelestimatedbymaximumlikelihood,thestatisticisC=iM[MM−MG(GG)−1GM]−1Mi,wheretherowsofGarethetermsinthegradientofthelog-likelihoodfunction,(GG)−1istheBHHHestimatoroftheasymptoticcovariancematrixoftheMLEofthemodelparameters,andtherowsofMaretheindividualtermsinthesamplemomentconditions.NotethatthisconstructionisthesameastheLMstatisticjustdiscussed.ThedifferenceisinhowtherowsofMareconstructed.Foraregressionmodelwithoutcensoring,thesamplecounterpartstothemomentrestrictionsin(1)to(3)wouldbe1nr=ze,wheree=y−xbandb=(XX)−1Xy,1iiiiini=11neer=ze2−s2,wheres2=,2iinni=1n1e3r3=4i4.nei−3si=1Forthepositiveobservations,weobservey∗,sotheobservationsinMarethesameasfortheclassicalregressionmodel;thatis,1.m=z(y−xβ),iiii2.m=z[(y−xβ)2−σ2],iiii3.m=[(y−xβ)3,(y−xβ)4−3σ4].iiiiiForthelimitobservations,theseobservationsarereplacedwiththeirexpectedvalues,conditionedony=0,whichmeansthaty∗≤0ore≤−xβ.Letq=(xβ)/σandλ=iiiiiφi/(1−i).Thenfrom(22-2),(22-3b),and(22-4),1.m=zE[(y∗−xβ)|y=0]=z[(xβ−σλ)−xβ]=z(2σλ).iiiiiiiiii2.m=zE[(y∗−xβ)2−σ2|y=0]=z[σ2(1+qλ)−σ2]=z(σ2qλ).iiiiiiiiiiE[ε2|y=0,x]isnotthevariance,sincethemeanisnotzero.)Forthethirdandiifourthmoments,wesimplyreproducePaganandVella’sresults[seealsoGreene(1995a,pp.618–619)]:3.m=σ3λ−2+q2,σq3+q2.iiiiiThesethreeitemsaretheremainingtermsneededtocomputeM.22.3.5CENSORINGANDTRUNCATIONINMODELSFORCOUNTSTruncationandcensoringarerelativelycommoninapplicationsofmodelsforcounts(seeSection21.9).Truncationoftenarisesasaconsequenceofdiscardingwhatappeartobeunusabledata,suchasthezerovaluesinsurveydataonthenumberofusesofrecreationfacilities[Shaw(1988)andBockstaeletal.(1990)].Thezerovaluesinthissettingmightrepresentadiscretedecisionnottovisitthesite,whichisaqualitativelydifferentdecisionfromthepositivenumberforsomeonewhohaddecidedtomakeat\nGreene-50240bookJune28,200217:5774CHAPTER22✦LimitedDependentVariableandDurationModelsleastonevisit.Insuchacase,itmightmakesensetoconfineattentiontothenonzeroobservations,therebytruncatingthedistribution.Censoring,incontrast,isoftenem-ployedtomakesurveydatamoreconvenienttogatherandanalyze.Forexample,surveydataonaccesstomedicalfacilitiesmightask,“Howmanytripstothedoctordidyoumakeinthelastyear?”Theresponsesmightbe0,1,2,3ormore.ModelswiththesecharacteristicscanbehandledwithinthePoissonandnegativebinomialregressionframeworksbyusingthelawsofprobabilitytomodifythelikeli-hood.Forexample,inthecensoreddatacase,e−λiλjiPi(j)=Prob[yi=j]=,j=0,1,2j!Pi(3)=Prob[yi≥3]=1−[Prob(yi=0)+Prob(yi=1)+Prob(yi=2)].Theprobabilitiesinthemodelwithtruncationabovezerowouldbee−λiλje−λiλjiiPi(j)=Prob[yi=j]==,j=1,2,....[1−Pi(0)]j![1−e−λi]j!ThesemodelsarenotappreciablymorecomplicatedtoanalyzethanthebasicPoissonornegativebinomialmodels.[SeeTerza(1985b),Mullahy(1986),Shaw(1988),GroggerandCarson(1991),Greene(1998),Lambert(1992),andWinkelmann(1997).]Theydo,however,bringsubstantivechangestothefamiliarcharacteristicsofthemodels.Forexample,theconditionalmeansarenolongerλi;inthecensoringcase,∞E[yi|xi]=λi−(j−3)Pi(j)<λi.j=3Marginaleffectsarechangedaswell.Recallthatourearlierresultforthecountdatamodelswas∂E[yi|xi]/∂xi=λiβ.Withcensoringortruncation,itisstraightforwardingeneraltoshowthat∂E[yi|xi]/∂xi=δiβ,butthenewscalefactorneednotbesmallerthanλi.22.3.6APPLICATION:CENSORINGINTHETOBITANDPOISSONREGRESSIONMODELSIn1969,thepopularmagazinePsychologyTodaypublisheda101-questionsurveyonsexandaskeditsreaderstomailintheiranswers.Theresultsofthesurveyweredis-cussedintheJuly1970issue.Fromtheapproximately2,000repliesthatwerecollectedinelectronicform(ofabout20,000received),ProfessorRayFair(1978)extractedasampleof601observationsonmenandwomenthencurrentlymarriedforthefirsttimeandanalyzedtheirresponsestoaquestionaboutextramaritalaffairs.Heusedthetobitmodelasaplatform.Fair’sanalysisinthisfrequentlycitedstudysuggestsseveralinterestingeconometricquestions.[Inaddition,his1977companionpaperinEcono-metricaonestimationofthetobitmodelcontributedtothedevelopmentoftheEMalgorithm,whichwaspublishedbyandisusuallyassociatedwithDempster,Laird,andRubin(1977).]Asnoted,Fairusedthetobitmodelashisestimationframeworkforthisstudy.Thenonexperimentalnatureofthedata(whichcanbedownloadedfromtheInternetathttp://fairmodel.econ.yale.edu/rayfair/work.ss.htm)providesafinelaboratorycasethat\nGreene-50240bookJune28,200217:5CHAPTER22✦LimitedDependentVariableandDurationModels775wecanusetoexaminetherelationshipsamongthetobit,truncatedregression,andprobitmodels.Inaddition,aswewillexplorebelow,althoughthetobitmodelseemstobeanaturalchoiceforthemodelforthesedata,acloserlooksuggeststhatthemodelsforcountswehaveexaminedatseveralpointsearliermightbeyetabetterchoice.Finally,thepreponderanceofzerosinthedatathatinitiallymotivatedthetobitmodelsuggeststhateventhestandardPoissonmodel,althoughanimprovement,mightstillbeinadequate.Inthisexample,wewillreestimateFair’soriginalmodelandthenapplysomeofthespecificationtestsandmodifiedmodelsforcountdataasalternatives.Thestudywasbasedon601observationsonthefollowingvariables(fulldetailsondatacodingaregiveninthedatafileandAppendixTableF22.2):y=numberofaffairsinthepastyear,0,1,2,3,4–10codedas7,“monthly,weekly,ordaily,”codedas12.Samplemean=1.46.Frequencies=(451,34,17,19,42,38).z1=sex=0forfemale,1formale.Samplemean=0.476.z2=age.Samplemean=32.5.z3=numberofyearsmarried.Samplemean=8.18.z4=children,0=no,1=yes.Samplemean=0.715.z5=religiousness,1=anti,...,5=very.Samplemean=3.12.z6=education,years,9=gradeschool,12=highschool,...,20=Ph.Dorother.Samplemean=16.2.z7=occupation,“Hollingsheadscale,”1–7.Samplemean=4.19.z8=self-ratingofmarriage,1=veryunhappy,...,5=veryhappy.Samplemean=3.93.Thetobitmodelwasfittoyusingaconstanttermandalleightvariables.Arestrictedmodelwasfitbyexcludingz1,z4,andz6,noneofwhichwasindividuallystatisticallysig-nificantinthemodel.WeareabletomatchexactlyFair’sresultsforbothequations.Thelog-likelihoodfunctionsforthefullandrestrictedmodelsare2704.7311and2705.5762.Thechi-squaredstatisticfortestingthehypothesisthatthethreecoefficientsarezeroistwicethedifference,1.6902.Thecriticalvaluefromthechi-squareddistributionwiththreedegreesoffreedomis7.81,sothehypothesisthatthecoefficientsonthesethreevariablesareallzeroisnotrejected.TheWaldandLagrangemultiplierstatisticsarelikewisesmall,6.59and1.681.Basedontheseresults,wewillcontinuetheanalysisusingtherestrictedsetofvariables,Z=(1,z2,z3,z5,z7,z8).Ourinterestissolelyinthenumericalresultsofdifferentmodelingapproaches.Readersmaydrawtheirownconclusionsandinterpretationsfromtheestimates.Table22.3presentsparameterestimatesbasedonFair’sspecificationofthenormaldistribution.Theinconsistentleastsquaresestimatesappearattheleftasabasisforcomparison.Themaximumlikelihoodtobitestimatesappearnext.Thesampleisheavilydominatedbyobservationswithy=0(451of601,or75percent),sothemarginaleffectsareverydifferentfromthecoefficients,byamultipleofroughly0.766.ThescalefactoriscomputedusingtheresultsofTheorem22.4forleftcensoringatzeroandtheupperlimitof+∞,withallvariablesevaluatedatthesamplemeansandtheparametersequal\nGreene-50240bookJune28,200217:5776CHAPTER22✦LimitedDependentVariableandDurationModelsTABLE22.3ModelEstimatesBasedontheNormalDistribution(StandardErrorsinParentheses)TobitTruncatedRegressionProbitLeastMarginalScaledMarginalSquaresEstimateEffectby1/σEstimateEstimateEffectVariable(1)(2)(3)(4)(5)(6)(7)Constant5.618.18—0.9910.9978.32—(0.797)(2.74)—(0.336)(0.361)(3.96)—z2−0.0504−0.179−0.042−0.022−0.022−0.0841−0.0407(0.0221)(0.079)(0.184)(0.010)(0.102)(0.119)(0.0578)z30.1620.5540.1300.06720.05990.5600.271(0.0369)(0.135)(0.0312)(0.0161)(0.0171)(0.219)(0.106)z5−0.476−1.69−0.394−0.2004−0.184−1.502−0.728(0.111)(0.404)(0.093)(0.484)(0.0515)(0.617)(0.299)z70.1060.3260.07620.03950.03750.1890.0916(0.0711)(0.254)(0.0595)(0.0308)(0.0328)(0.377)(0.182)z8−0.712−2.29−0.534−0.277−0.273−1.35−0.653(0.118)(0.408)(0.0949)(0.0483)(0.0525)(0.565)(0.273)σ3.098.255.53logL−705.5762−307.2955−329.7103tothemaximumlikelihoodestimates:+∞−x¯βˆ0−x¯βˆ0−x¯βˆx¯βˆMLMLMLMLscale=−=1−==0.234.σˆMLσˆMLσˆMLσˆMLTheseestimatesareshowninthethirdcolumn.Asexpected,theyresembletheleastsquaresestimates,althoughnotenoughthatonewouldbecontenttouseOLSforestimation.ThefifthcolumninTable22.3givesestimatesoftheprobitmodelestimatedforthedependentvariableqi=0ifyi=0,qi=1ifyi>0.Ifthespecificationofthetobitmodeliscorrect,thentheprobitestimatorsshouldbeconsistentfor(1/σ)βfromthetobitmodel.Theseestimates,withstandarderrorscomputedusingthedeltamethod,areshownincolumn4.Theresultsaresurprisinglyclose,especiallygiventheresultsofthespecificationtestconsideredlater.Finally,columns6and7givetheestimatesforthetruncatedregressionmodelthatappliestothe150nonlimitobservationsifthespecificationofthemodeliscorrect.Heretheresultsseemabitlessconsistent.Severalspecificationtestsweresuggestedforthismodel.TheCragg/GreenetestforappropriatespecificationofProb[yi=0]isgiveninSection22.3.4.b.Thistestiseasilycarriedoutusingthelog-likelihoodvaluesinthetable.Thechi-squaredstatistic,whichhassevendegreesoffreedomis−2−705.5762−[−307.2955+(−392.7103)]=11.141,whichissmallerthanthecriticalvalueof14.067.Weconcludethatthetobitmodeliscorrectlyspecified(thedecisionofwhetherornotisnotdifferentfromthedecisionofhowmany,given“whether”).Wenowturntothenormalitytests.Weemphasizethatthesetestsarenonconstructivetestsoftheskewnessandkurtosisofthedistributionofε.Afortiori,ifwedorejectthehypothesisthatthesevaluesare0.0and3.0,respectively,thenwecanrejectnormality.Butthatdoesnotsuggestwhattodonext.Weturntothatissuelater.TheChesher–IrishandPagan–Vellachi-squaredstatisticsare562.218and22.314,respectively.Thecriticalvalueis5.99,soonthebasisofbothofthese\nGreene-50240bookJune28,200217:5CHAPTER22✦LimitedDependentVariableandDurationModels777values,thehypothesisofnormalityisrejected.Thus,boththeprobabilitymodelandthedistributionalframeworkarerejectedbythesetests.Beforeleavingthetobitmodel,weconsideroneadditionalaspectoftheoriginalspecification.Thevaluesabove4intheobserveddataarenottrueobservationsontheresponse;7isanestimateofthemeanofobservationsthatfallintherange4to10,whereas12waschosenmoreorlessarbitrarilyforobservationsthatweregreaterthan10.Theseobservationsrepresent80ofthe601observations,orabout13percentofthesample.Tosomeextent,thiscodingschememightbedrivingtheresults.[Thispointwasnotoverlookedintheoriginalstudy;“[a]linearspecificationwasusedfortheestimatedequation,anditdidnotseemreasonableinthiscase,giventherangeofexplanatoryvariables,tohaveadependentvariablethatrangedfrom,say,0to365”[Fair(1978),p.55].Thetobitmodelallowsforcensoringinbothtailsofthedistribution.Ignoringtheresultsofthespecificationtestsforthemoment,wewillexamineadoublycensoredregressionbyrecodingallobservationsthattakethevalues7,or12as4.Themodelisthusy∗=xβ+ε,y=0ify∗≤0,y=y∗if0a)=.Prob(z>a)Toobtaintheincidentallytruncatedmarginaldensityfory,wewouldthenintegratezoutofthisexpression.ThemomentsoftheincidentallytruncatednormaldistributionaregiveninTheorem22.5.20THEOREM22.5MomentsoftheIncidentallyTruncatedBivariateNormalDistributionIfyandzhaveabivariatenormaldistributionwithmeansµyandµz,standarddeviationsσyandσz,andcorrelationρ,thenE[y|z>a]=µy+ρσyλ(αz),(22-19)Var[y|z>a]=σ2[1−ρ2δ(α)],yzwhereαz=(a−µz)/σz,λ(αz)=φ(αz)/[1−(αz)],andδ(αz)=λ(αz)[λ(αz)−αz].19WewillreconsidertheissueofthenormalityassumptioninSection22.4.5.20MuchmoregeneralformsoftheresultthatapplytomultivariatedistributionsaregiveninJohnsonandKotz(1974).SeealsoMaddala(1983,pp.266–267).\nGreene-50240bookJune28,200217:5782CHAPTER22✦LimitedDependentVariableandDurationModelsNotethattheexpressionsinvolvingzareanalogoustothemomentsofthetruncateddistributionofxgiveninTheorem22.2.Ifthetruncationisz0]iiii=E[y|u>−wγ]iii=xβ+E[ε|u>−wγ]iiii=xβ+ρσλ(α)iεiu=xβ+βλ(α),iiλiu21See,forexample,Heckman(1976).ThisstrandofliteraturebeginswithanexchangebyGronau(1974)andLewis(1974).\nGreene-50240bookJune28,200217:5CHAPTER22✦LimitedDependentVariableandDurationModels783whereα=−wγ/σandλ(α)=φ(wγ/σ)/(wγ/σ).So,uiuuiuiuy|z∗>0=E[y|z∗>0]+viiiii=xβ+βλ(α)+v.iλiuiLeastsquaresregressionusingtheobserveddata—forinstance,OLSregressionofhoursonitsdeterminants,usingonlydataforwomenwhoareworking—producesinconsistentestimatesofβ.Onceagain,wecanviewtheproblemasanomittedvariable.Leastsquaresregressionofyonxandλwouldbeaconsistentestimator,butifλisomitted,thenthespecificationerrorofanomittedvariableiscommitted.Finally,notethatthesecondpartofTheorem22.5impliesthatevenifλiwereobserved,thenleastsquareswouldbeinefficient.Thedisturbanceviisheteroscedastic.Themarginaleffectoftheregressorsonyiintheobservedsampleconsistsoftwocomponents.Thereisthedirecteffectonthemeanofyi,whichisβ.Inaddition,foraparticularindependentvariable,ifitappearsintheprobabilitythatz∗ispositive,theniitwillinfluenceyithroughitspresenceinλi.Thefulleffectofchangesinaregressorthatappearsinbothxiandwionyis∂E[y|z∗>0]ρσiiε=βk−γkδi(αu),∂xikσuwhereδ=λ2−αλ.22iiiiSupposethatρispositiveandE[y]isgreaterwhenz∗ispositivethanwhenitisnegative.iiSince0<δi<1,theadditionaltermservestoreducethemarginaleffect.Thechangeintheprobabilityaffectsthemeanofyinthatthemeaninthegroupz∗>0ishigher.iiThesecondterminthederivativecompensatesforthiseffect,leavingonlythemarginaleffectofachangegiventhatz∗>0tobeginwith.ConsiderExample22.9,andsupposeithateducationaffectsboththeprobabilityofmigrationandtheincomeineitherstate.Ifwesupposethattheincomeofmigrantsishigherthanthatofotherwiseidenticalpeoplewhodonotmigrate,thenthemarginaleffectofeducationhastwoparts,oneduetoitsinfluenceinincreasingtheprobabilityoftheindividual’senteringahigher-incomegroupandoneduetoitsinfluenceonincomewithinthegroup.Assuch,thecoefficientoneducationintheregressionoverstatesthemarginaleffectoftheeducationofmigrantsandunderstatesitfornonmigrants.Thesizesofthevariouspartsdependonthesetting.Itisquitepossiblethatthemagnitude,sign,andstatisticalsignificanceoftheeffectmightallbedifferentfromthoseoftheestimateofβ,apointthatappearsfrequentlytobeoverlookedinempiricalstudies.Inmostcases,theselectionvariablez∗isnotobserved.Rather,weobserveonlyitssign.Toconsiderourtwoexamples,wetypicallyobserveonlywhetherawomanisworkingornotworkingorwhetheranindividualmigratedornot.Wecaninferthesignofz∗,butnotitsmagnitude,fromsuchinformation.Sincethereisnoinformationonthescaleofz∗,thedisturbancevarianceintheselectionequationcannotbeestimated.(WeencounteredthisprobleminChapter21inconnectionwiththeprobitmodel.)22Wehavereversedthesignofαµin(22-19)sincea=0,andα=γw/σMissomewhatmoreconvenient.Also,assuch,∂λ/∂α=−δ.\nGreene-50240bookJune28,200217:5784CHAPTER22✦LimitedDependentVariableandDurationModelsThus,wereformulatethemodelasfollows:selectionmechanism:z∗=wγ+u,z=1ifz∗>0and0otherwise;iiiiiProb(z=1|w)=(wγ)andiiiProb(z=0|w)=1−(wγ).(22-20)iiiregressionmodel:y=xβ+εobservedonlyifz=1,iiii(ui,εi)∼bivariatenormal[0,0,1,σε,ρ].Supposethat,asinmanyofthesestudies,ziandwiareobservedforarandomsampleofindividualsbutyiisobservedonlywhenzi=1.Thismodelispreciselytheoneweexaminedearlier,withE[y|z=1,x,w]=xβ+ρσλ(wγ).iiiiiei22.4.3ESTIMATIONTheparametersofthesampleselectionmodelcanbeestimatedbymaximumlike-lihood.23However,Heckman’s(1979)two-stepestimationprocedureisusuallyusedinstead.Heckman’smethodisasfollows:241.Estimatetheprobitequationbymaximumlikelihoodtoobtainestimatesofγ.Foreachobservationintheselectedsample,computeλˆ=φ(wγˆ)/(wγˆ)andiiiδˆ=λˆ(λˆ−wγˆ).iiii2.Estimateβandβλ=ρσebyleastsquaresregressionofyonxandλˆ.Itispossiblealsotoconstructconsistentestimatorsoftheindividualparametersρandσε.Ateachobservation,thetrueconditionalvarianceofthedisturbancewouldbeσ2=σ2(1−ρ2δ).iεiTheaverageconditionalvarianceforthesamplewouldconvergeto1nplimσ2=σ2(1−ρ2δ),¯iεni=1whichiswhatisestimatedbytheleastsquaresresidualvarianceee/n.Forthesquareofthecoefficientonλ,wehaveplimb2=ρ2σ2,λεwhereasbasedontheprobitresultswehave1nplimδˆi=δ.¯ni=1Wecanthenobtainaconsistentestimatorofσ2usingε1σˆ2=ee+δˆ¯b2.ελn23SeeGreene(1995a).24Perhapsinamimicryofthe“tobit”estimatordescribedearlier,thisprocedurehascometobeknownasthe“Heckit”estimator.\nGreene-50240bookJune28,200217:5CHAPTER22✦LimitedDependentVariableandDurationModels785Finally,anestimatorofρ2isb2ρˆ2=λ,σˆ2εwhichprovidesacompletesetofestimatorsofthemodel’sparameters.25Totesthypotheses,anestimateoftheasymptoticcovariancematrixof[b,b]isλneeded.Wehavetwoproblemstocontendwith.First,wecanseeinTheorem22.5thatthedisturbancetermin(y|z=1,x,w)=xβ+ρσλ+v(22-21)iiiiieiiisheteroscedastic;Var[v|z=1,x,w]=σ2(1−ρ2δ).iiiiεiSecond,thereareunknownparametersinλi.Supposethatweassumeforthemomentthatλiandδiareknown(i.e.,wedonothavetoestimateγ).Forconvenience,letx∗=[x,λ],andletb∗betheleastsquarescoefficientvectorintheregressionofyoniiix∗intheselecteddata.Then,usingtheappropriateformofthevarianceofordinaryleastsquaresinaheteroscedasticmodelfromChapter11,wewouldhavetoestimatenVar[b∗]=σ2[XX]−1(1−ρ2δ)x∗x∗[XX]−1ε∗∗iii∗∗i=1=σ2[XX]−1[X(I−ρ2)X][XX]−1,ε∗∗∗∗∗∗whereI−ρ2isadiagonalmatrixwith(1−ρ2δ)onthediagonal.Withoutanyothericomplications,thisresultcouldbecomputedfairlyeasilyusingX,thesampleestimatesofσ2andρ2,andtheassumedknownvaluesofλandδ.εiiTheparametersinγdohavetobeestimatedusingtheprobitequation.Rewrite(22-21)as(y|z=1,x,w)=βx+βλˆ+v−β(λˆ−λ).iiiiiλiiλiiInthisform,weseethatintheprecedingexpressionwehaveignoredbothanadditionalsourceofvariationinthecompounddisturbanceandcorrelationacrossobservations;thesameestimateofγisusedtocomputeλˆiforeveryobservation.Heckmanhasshownthattheearliercovariancematrixcanbeappropriatelycorrectedbyaddingaterminsidethebrackets,Q=ρˆ2(XˆW)Est.Asy.Var[γˆ](WˆX)=ρˆ2FˆVˆFˆ,∗∗whereVˆ=Est.Asy.Var[γˆ],theestimatoroftheasymptoticcovarianceoftheprobitcoefficients.Anyoftheestimatorsin(21-22)to(21-24)maybeusedtocomputeVˆ.ThecompleteexpressionisEst.Asy.Var[b,b]=σˆ2[XX]−1[X(I−ρˆ2ˆ)X+Q][XX]−1.26λε∗∗∗∗∗∗25Notethatρˆ2isnotasamplecorrelationand,assuch,isnotlimitedto[0,1].SeeGreene(1981)fordiscussion.26ThismatrixformulationisderivedinGreene(1981).NotethattheMurphyandTopel(1985)resultsfortwo-stepestimatorsgiveninTheorem10.3wouldapplyhereaswell.Asymptotically,thismethodwouldgivethesameanswer.TheHeckmanformulationhasbecomestandardintheliterature.\nGreene-50240bookJune28,200217:5786CHAPTER22✦LimitedDependentVariableandDurationModelsTABLE22.7EstimatedSelectionCorrectedWageEquationTwo-StepMaximumLikelihoodLeastSquaresEstimateStd.Err.EstimateStd.Err.EstimateStd.Err.β1−0.971(2.06)−0.632(1.063)−2.56(0.929)β20.021(0.0625)0.00897(0.000678)0.0325(0.0616)β30.000137(0.00188)−0.334d−4(0.782d−7)−0.000260(0.00184)β40.417(0.100)0.147(0.0142)0.481(0.0669)β50.444(0.316)0.144(0.0614)0.449(0.449)(ρσ)−1.100(0.127)ρ−0.340−0.131(0.218)0.000σ3.2000.321(0.00866)3.111Example22.8FemaleLaborSupplyExamples21.1and21.4proposedalaborforceparticipationmodelforasampleof753marriedwomeninasampleanalyzedbyMroz(1987).Thedatasetcontainswageandhoursinformationforthe428womenwhoparticipatedintheformalmarket(LFP=1).Fol-lowingMroz,wesupposethatforthese428individuals,theofferedwageexceededthereservationwageand,moreover,theunobservedeffectsinthetwowageequationsarecor-related.Assuch,awageequationbasedonthemarketdatashouldaccountforthesampleselectionproblem.Wespecifyasimplewagemodel:2wage=β1+β2Exper+β3Exper+β4Education+β5City+εwhereExperislabormarketexperienceandCityisadummyvariableindicatingthattheindi-viduallivedinalargeurbanarea.Maximumlikelihood,Heckmantwo-step,andordinaryleastsquaresestimatesofthewageequationareshowninTable22.7.ThemaximumlikelihoodestimatesareFIMLestimates—thelaborforceparticipationequationisreestimatedatthesametime.Onlytheparametersofthewageequationareshownbelow.Noteaswellthatthetwo-stepestimatorestimatesthesinglecoefficientonλiandthestructuralparametersσandρarededucedbythemethodofmoments.Themaximumlikelihoodestimatorcomputesestimatesoftheseparametersdirectly.[DetailsonmaximumlikelihoodestimationmaybefoundinMaddala(1983).]Thedifferencesbetweenthetwo-stepandmaximumlikelihoodestimatesinTable22.7aresurprisinglylarge.Thedifferenceisevenmorestrikinginthemarginaleffects.Theeffectforeducationisestimatedas0.417+0.0641forthetwostepestimatorsand0.149intotalforthemaximumlikelihoodestimates.Forthekidsvariable,themarginaleffectis−.293forthetwo-stepestimatesandonly−0.0113fortheMLEs.Surprisingly,thedirecttestforaselectioneffectinthemaximumlikelihoodestimates,anonzeroρ,failstorejectthehypothesisthatρequalszero.Insomesettings,theselectionprocessisanonrandomsortingofindividualsintotwoormoregroups.Themover-stayermodelinthenextexampleisafamiliarcase.Example22.9AMoverStayerModelforMigrationThemodelofmigrationanalyzedbyNakosteenandZimmer(1980)fitsintotheframeworkdescribedabove.Theequationsofthemodelarenetbenefitofmoving:M∗=wγ+u,iiiincomeifmoves:I=xβ+ε,i1i11i1incomeifstays:I=xβ+ε.i0i00i0Onecomponentofthenetbenefitisthemarketwageindividualscouldachieveiftheymove,comparedwithwhattheycouldobtainiftheystay.Therefore,amongthedeterminantsof\nGreene-50240bookJune28,200217:5CHAPTER22✦LimitedDependentVariableandDurationModels787TABLE22.8EstimatedEarningsEquationsMigrantNonmigrantMigrationEarningsEarningsConstant−1.5099.0418.593SE−0.708(−5.72)−4.104(−9.54)−4.161(−57.71)EMP−1.488(−2.60)——PCI1.455(3.14)——Age−0.008(−5.29)——Race−0.065(−1.17)——Sex−0.082(−2.14)——SIC0.948(24.15)−0.790(−2.24)−0.927(−9.35)λ—0.212(0.50)0.863(2.84)thenetbenefitarefactorsthatalsoaffecttheincomereceivedineitherplace.Ananalysisofincomeinasampleofmigrantsmustaccountfortheincidentaltruncationofthemover’sincomeonapositivenetbenefit.Likewise,theincomeofthestayerisincidentallytruncatedonanonpositivenetbenefit.Themodelimpliesanincomeaftermovingforallobservations,butweobserveitonlyforthosewhoactuallydomove.NakosteenandZimmer(1980)appliedtheselectivitymodeltoasampleof9,223individualswithdatafor2years(1971and1973)sampledfromtheSocialSecurityAdministration’sContinuousWorkHistorySample.Overtheperiod,1,078individualsmigratedandtheremaining8,145didnot.Theindependentvariablesinthemigrationequationwereasfollows:SE=self-employmentdummyvariable;1ifyes,EMP=rateofgrowthofstateemployment,PCI=growthofstatepercapitaincome,x=age,race(nonwhite=1),sex(female=1),SIC=1ifindividualchangesindustry.TheearningsequationsincludedSICandSE.TheauthorsreportedtheresultsgiveninTable22.8.Thefiguresinparenthesesareasymptotictratios.22.4.4TREATMENTEFFECTSThebasicmodelofselectivityoutlinedearlierhasbeenextendedinanimpressivevarietyofdirections.27Aninterestingapplicationthathasfoundwideuseisthemeasurementoftreatmenteffectsandprogrameffectiveness.28Anearningsequationthataccountsforthevalueofacollegeeducationisearnings=xβ+δC+ε,iiiiwhereCiisadummyvariableindicatingwhetherornottheindividualattendedcollege.Thesameformathasbeenusedinanynumberofotheranalysesofprograms,experi-ments,andtreatments.Thequestionis:Doesδmeasurethevalueofacollegeeducation27Forasurvey,seeMaddala(1983).28Thisisoneofthefundamentalapplicationsofthisbodyoftechniques,andisalsothesettingforthemostlongstandingandcontentiousdebateonthesubject.AJournalofBusinessandEconomicStatisticssymposium[Angristetal.(2001)]raisedmanyoftheimportantquestionsonwhetherandhowitispossibletomeasuretreatmenteffects.\nGreene-50240bookJune28,200217:5788CHAPTER22✦LimitedDependentVariableandDurationModels(assumingthattherestoftheregressionmodeliscorrectlyspecified)?Theanswerisnoifthetypicalindividualwhochoosestogotocollegewouldhaverelativelyhighearningswhetherornotheorshewenttocollege.Theproblemisoneofself-selection.Ifourobservationiscorrect,thenleastsquaresestimatesofδwillactuallyoverestimatethetreatmenteffect.Thesameobservationappliestoestimatesofthetreatmenteffectsinothersettingsinwhichtheindividualsthemselvesdecidewhetherornottheywillreceivethetreatment.Toputthisinamorefamiliarcontext,supposethatwemodelprogramparticipation(e.g.,whetherornottheindividualgoestocollege)asC∗=wγ+u,iiiC=1ifC∗>0,0otherwise.iiWealsosupposethat,consistentwithourpreviousconjecture,uiandεiarecorrelated.Coupledwithourearningsequation,wefindthatE[y|C=1,x,z]=xβ+δ+E[ε|C=1,x,z]iiiiiiiii(22-22)=xβ+δ+ρσλ(−wγ)iεionceagain.[See(22-19).]Evidently,aviablestrategyforestimatingthismodelistousethetwo-stepestimatordiscussedearlier.Thenetresultwillbeadifferentestimateofδthatwillaccountfortheself-selectednatureofprogramparticipation.Fornonpartici-pants,thecounterpartto(22-22)is−φ(wγ)E[y|C=0,x,z]=xβ+ρσi.iiiiiε1−(wiγ)Thedifferenceinexpectedearningsbetweenparticipantsandnonparticipantsis,then,φiE[yi|Ci=1,xi,zi]−E[yi|Ci=0,xi,zi]=δ+ρσε.i(1−i)Iftheselectivitycorrectionλiisomittedfromtheleastsquaresregression,thenthisdifferenceiswhatisestimatedbytheleastsquarescoefficientonthetreatmentdummyvariable.Butsince(byassumption)alltermsarepositive,weseethatleastsquaresover-estimatesthetreatmenteffect.Note,finally,thatsimplyestimatingseparateequationsforparticipantsandnonparticipantsdoesnotsolvetheproblem.Infact,doingsowouldbeequivalenttoestimatingthetworegressionsofExample22.9byleastsquares,which,aswehaveseen,wouldleadtoinconsistentestimatesofbothsetsofparameters.Therearemanyvariationsofthismodelintheempiricalliterature.Theyhavebeenappliedtotheanalysisofeducation,29theHeadStartprogram,30andahostofothersettings.31Thisstrandofliteratureisparticularlyimportantbecausetheuseofdummyvariablemodelstoanalyzetreatmenteffectsandprogramparticipationhasalong29WillisandRosen(1979).30Goldberger(1972).31AusefulsummaryoftheissuesisBarnow,Cain,andGoldberger(1981).SeealsoMaddala(1983)foralonglistofapplications.Arelatedapplicationistheswitchingregressionmodel.See,forexample,Quandt(1982,1988).\nGreene-50240bookJune28,200217:5CHAPTER22✦LimitedDependentVariableandDurationModels789historyinempiricaleconomics.Thisanalysishascalledintoquestiontheinterpretationofanumberofreceivedstudies.22.4.5THENORMALITYASSUMPTIONSomeresearchhascastsomeskepticismontheselectionmodelbasedonthenormaldistribution.[SeeGoldberger(1983)foranearlysalvointhisliterature.]Amongthefindingsarethattheparameterestimatesaresurprisinglysensitivetothedistributionalassumptionthatunderliesthemodel.Ofcourse,thisfactinitselfdoesnotinvalidatethenormalityassumption,butitdoescallitsgeneralityintoquestion.Ontheotherhand,thereceivedevidenceisconvincingthatsampleselection,intheabstract,raisesseriousproblems,distributionalquestionsaside.Theliterature—forexample,Duncan(1986b),Manski(1989,1990),andHeckman(1990)—hassuggestedsomepromisingapproachesbasedonrobustandnonparametricestimators.Theseapproachesobviouslyhavethevirtueofgreatergenerality.Unfortunately,thecostisthattheygenerallyarequitelimitedinthebreadthofthemodelstheycanaccommodate.Thatis,onemightgaintherobustnessofanonparametricestimatoratthecostofbeingunabletomakeuseoftherichsetofaccompanyingvariablesusuallypresentinthepanelstowhichselectivitymodelsareoftenapplied.Forexample,thenonparametricboundsapproachofManski(1990)isdefinedfortworegressors.Othermethods[e.g.,Duncan(1986b)]allowmoreelaboratespecification.Recentresearchincludesspecificattemptstomoveawayfromthenormalityassumption.32AnexampleisMartins(2001),buildingonNewey(1991),whichtakesthecorespecificationasgivenin(22-20)astheplatform,butconstructsanalternativetotheassumptionofbivariatenormality.Martins’specificationmodifiestheHeckmanmodelbyemployinganequationoftheformE[y|z=1,x,w]=xβ+µ(wγ)iiiiiiwherethelatter,“selectivitycorrection”isnottheinverseMillsratio,butsomeotherresultfromadifferentmodel.ThecorrectiontermisestimatedusingtheKleinandSpadymodeldiscussedinSection21.5.4.Thisislabeleda“semiparametric”approach.Whethertheconditionalmeanintheselectedsampleshouldevenremainalinearindexfunctionremainstobesettled.Notsurprisingly,Martins’results,basedontwo-stepleastsquaresdifferonlyslightlyfromtheconventionalresultsbasedonnormality.ThisapproachisarguablyonlyafairlysmallstepawayfromthetightparameterizationoftheHeckmanmodel.Othernon-andsemiparametricspecifications,e.g.,HonoreandKyriazidou(1999,2000)representmoresubstantialdeparturesfromthenormalmodel,butaremuchlessoperational.33Theupshotisthattheissueremainsunsettled.Forbetterorworse,theempiricalliteratureonthesubjectcontinuestobedominatedbyHeckman’soriginalmodelbuiltaroundthejointnormaldistribution.32Again,Angristetal.(2001)isanimportantcontributiontothisliterature.33Thisparticularworkconsidersselectionina“panel”(mainlytwoperiods).But,thepaneldatasettingforsampleselectionmodelsismoreinvolvedthanacrosssectionanalysis.Inapaneldataset,the“selection”islikelytobeadecisionatthebeginningofPeriod1tobeinthedatasetforallsubsequentperiods.Assuch,somethingmoreintricatethanthemodelwehaveconsideredhereiscalledfor.\nGreene-50240bookJune28,200217:5790CHAPTER22✦LimitedDependentVariableandDurationModels22.4.6SELECTIONINQUALITATIVERESPONSEMODELSTheproblemofsampleselectionhasbeenmodeledinothersettingsbesidesthelinearregressionmodel.InSection21.6.4,wesaw,forexample,anapplicationofwhatamountstoamodelofsampleselectioninabivariateprobitmodel;abinaryresponsevariableyi=1ifanindividualdefaultsonaloanisobservedonlyifarelatedvariableziequalsone(theindividualisgrantedaloan).Greene’s(1992)applicationtocreditcardapplicationsanddefaultsissimilar.Acurrentstrandofliteraturehasdevelopedseveralmodelsofsampleselectionforcountdatamodels.34Terza(1995)modelsthephenomenonasaformofheterogeneityinthePoissonmodel.Wewriteyi|εi∼Poisson(λi),(22-23)lnλ|ε=xβ+ε.iiiiThenthesampleselectionissimilartothatdiscussedintheprevioussections,withz∗=wγ+u,iiiz=1ifz∗>0,0otherwiseiiand[εi,ui]haveabivariatenormaldistributionwiththesamespecificationasinourearliermodel.Asbefore,weassumethat[yi,xi]areonlyobservedwhenzi=1.Thus,theeffectoftheselectionistoaffectthemean(andvariance)ofyi,althoughtheeffectonthedistributionisunclear.Intheobserveddata,yinolongerhasaPoissondistribution.Terza(1998),TerzaandKenkel(2001)andGreene(1997a)suggestedamaximumlikelihoodapproachforestimation.22.5MODELSFORDURATIONDATA35Intuitionmightsuggestthatthelongerastrikepersists,themorelikelyitisthatitwillendwithin,say,thenextweek.Orisit?Itseemsequallyplausibletosuggestthatthelongerastrikehaslasted,themoredifficultmustbetheproblemsthatledtoitinthefirstplace,andhencethelesslikelyitisthatitwillendinthenextshorttimeinterval.Asimilarkindofreasoningcouldbeappliedtospellsofunemploymentortheintervalbetweenconceptions.Ineachofthesecases,itisnotonlythedurationoftheevent,perse,thatisinteresting,butalsothelikelihoodthattheeventwillendin“thenextperiod”giventhatithaslastedaslongasithas.Analysisofthelengthoftimeuntilfailurehasinterestedengineersfordecades.Forexample,themodelsdiscussedinthissectionwereappliedtothedurabilityofelectricandelectroniccomponentslongbeforeeconomistsdiscoveredtheirusefulness.34See,forexample,Bockstaeletal.(1990),Smith(1988),Brannas(1995),Greene(1994,1995c,1997a),Weiss(1995),andTerza(1995,1998),andWinkelmann(1997).35Therearealargenumberofhighlytechnicalarticlesonthistopicbutrelativelyfewaccessiblesourcesfortheuninitiated.AparticularlyusefulintroductorysurveyisKiefer(1988),uponwhichwehavedrawnheavilyforthissection.OtherusefulsourcesareKalbfleischandPrentice(1980),HeckmanandSinger(1984a),Lancaster(1990)andFlorens,Fougere,andMouchart(1996).\nGreene-50240bookJune28,200217:5CHAPTER22✦LimitedDependentVariableandDurationModels791Likewise,theanalysisofsurvivaltimes—forexample,thelengthofsurvivalaftertheonsetofadiseaseorafteranoperationsuchasahearttransplant—haslongbeenastapleofbiomedicalresearch.Socialscientistshaverecentlyappliedthesamebodyoftechniquestostrikeduration,lengthofunemploymentspells,intervalsbetweencon-ception,timeuntilbusinessfailure,lengthoftimebetweenarrests,lengthoftimefrompurchaseuntilawarrantyclaimismade,intervalsbetweenpurchases,andsoon.Thissectionwillgiveabriefintroductiontotheeconometricanalysisofdurationdata.Asusual,wewillrestrictourattentiontoafewstraightforward,relativelyuncom-plicatedtechniquesandapplications,primarilytointroducetermsandconcepts.Thereadercanthenwadeintotheliteraturetofindtheextensionsandvariations.Wewillconcentrateprimarilyonwhatareknownasparametricmodels.Theseapplyfamiliarinferencetechniquesandprovideaconvenientdeparturepoint.Alternativeapproachesareconsideredattheendofthediscussion.22.5.1DURATIONDATAThevariableofinterestintheanalysisofdurationisthelengthoftimethatelapsesfromthebeginningofsomeeventeitheruntilitsendoruntilthemeasurementistaken,whichmayprecedetermination.Observationswilltypicallyconsistofacrosssectionofdurations,t1,t2,...,tn.Theprocessbeingobservedmayhavebegunatdifferentpointsincalendartimeforthedifferentindividualsinthesample.Forexample,thestrikedurationdataexaminedinExample22.10aredrawnfromninedifferentyears.Censoringisapervasiveandusuallyunavoidableproblemintheanalysisofdurationdata.Thecommoncauseisthatthemeasurementismadewhiletheprocessisongoing.Anobviousexamplecanbedrawnfrommedicalresearch.Consideranalyzingthesurvivaltimesofhearttransplantpatients.Althoughthebeginningtimesmaybeknownwithprecision,atthetimeofthemeasurement,observationsonanyindividualswhoarestillalivearenecessarilycensored.Likewise,samplesofspellsofunemploymentdrawnfromsurveyswillprobablyincludesomeindividualswhoarestillunemployedatthetimethesurveyistaken.Fortheseindividuals,duration,orsurvival,isatleasttheobservedti,butnotequaltoit.EstimationmustaccountforthecensorednatureofthedataforthesamereasonsasconsideredinSection22.3.Theconsequencesofignoringcensoringindurationdataaresimilartothosethatariseinregressionanalysis.Inaconventionalregressionmodelthatcharacterizestheconditionalmeanandvarianceofadistribution,theregressorscanbetakenasfixedcharacteristicsatthepointintimeorfortheindividualforwhichthemeasurementistaken.Whenmeasuringduration,theobservationisimplicitlyonaprocessthathasbeenunderwayforanintervaloftimefromzerotot.Iftheanalysisisconditionedonasetofcovariates(thecounterpartstoregressors)xt,thenthedurationisimplicitlyafunctionoftheentiretimepathofthevariablex(t),t=(0,t),whichmayhavechangedduringtheinterval.Forexample,theobserveddurationofemploymentinajobmaybeafunctionoftheindividual’srankinthefirm.Buttheirrankmayhavechangedseveraltimesbetweenthetimetheywerehiredandwhentheobservationwasmade.Assuch,observedrankattheendofthejobtenureisnotnecessarilyacompletedescriptionoftheindividual’srankwhiletheywereemployed.Likewise,maritalstatus,familysize,andamountofeducationareallvariablesthatcanchangeduringthedurationofunemploymentand\nGreene-50240bookJune28,200217:5792CHAPTER22✦LimitedDependentVariableandDurationModelsthatonewouldliketoaccountforinthedurationmodel.Thetreatmentoftime-varyingcovariatesisaconsiderablecomplication.3622.5.2AREGRESSION-LIKEAPPROACH:PARAMETRICMODELSOFDURATIONWewillusethetermspellasacatchallforthedifferentdurationvariableswemightmeasure.SpelllengthisrepresentedbytherandomvariableT.Asimpleapproachtodurationanalysiswouldbetoapplyregressionanalysistothesampleofobservedspells.Bythisdevice,wecouldcharacterizetheexpectedduration,perhapsconditionedonasetofcovariateswhosevaluesweremeasuredattheendoftheperiod.WecouldalsoassumethatconditionedonanxthathasremainedfixedfromT=0toT=t,thasanormaldistribution,aswecommonlydoinregression.Wecouldthencharacterizetheprobabilitydistributionofobserveddurationtimes.But,normalityturnsoutnottobeparticularlyattractiveinthissettingforanumberofreasons,notleastofwhichisthatdurationispositivebyconstruction,whileanormallydistributedvariablecantakenegativevalues.(Lognormalityturnsouttobeapalatablealternative,butitisonlyoneamongalonglistofcandidates.)22.5.2.aTheoreticalBackgroundSupposethattherandomvariableThasacontinuousprobabilitydistributionf(t),wheretisarealizationofT.ThecumulativeprobabilityistF(t)=f(s)ds=Prob(T≤t).0Wewillusuallybemoreinterestedintheprobabilitythatthespellisoflengthatleastt,whichisgivenbythesurvivalfunction,S(t)=1−F(t)=Prob(T≥t).Considerthequestionraisedintheintroduction:Giventhatthespellhaslasteduntiltimet,whatistheprobabilitythatitwillendinthenextshortintervaloftime,sayt?Itisl(t,t)=Prob(t≤T≤t+t|T≥t).Ausefulfunctionforcharacterizingthisaspectofthedistributionisthehazardrate,Prob(t≤T≤t+t|T≥t)F(t+t)−F(t)f(t)λ(t)=lim=lim=.t→0tt→0tS(t)S(t)Roughly,thehazardrateistherateatwhichspellsarecompletedafterdurationt,giventhattheylastatleastuntilt.Assuch,thehazardfunctiongivesananswertoouroriginalquestion.Thehazardfunction,thedensity,theCDFandthesurvivalfunctionareallrelated.Thehazardfunctionis−dlnS(t)λ(t)=dt36SeePetersen(1986)foroneapproachtothisproblem.\nGreene-50240bookJune28,200217:5CHAPTER22✦LimitedDependentVariableandDurationModels793sof(t)=S(t)λ(t).Anotherusefulfunctionistheintegratedhazardfunctiont(t)=λ(s)ds,0forwhichS(t)=e−(t),so(t)=−lnS(t).Theintegratedhazardfunctionisgeneralizedresidualinthissetting.[SeeChesherandIrish(1987)andExample22.10.]22.5.2.bModelsoftheHazardFunctionForpresentpurposes,thehazardfunctionismoreinterestingthanthesurvivalrateorthedensity.Basedonthepreviousresults,onemightconsidermodelingthehazardfunctionitself,ratherthan,say,modelingthesurvivalfunctionthenobtainingthedensityandthehazard.Forexample,thebasecaseformanyanalysesisahazardratethatdoesnotvaryovertime.Thatis,λ(t)isaconstantλ.Thisischaracteristicofaprocessthathasnomemory;theconditionalprobabilityof“failure”inagivenshortintervalisthesameregardlessofwhentheobservationismade.Thus,λ(t)=λ.Fromtheearlierdefinition,weobtainthesimpledifferentialequation,−dlnS(t)=λ.dtThesolutionislnS(t)=k−λtorS(t)=Ke−λt,whereKistheconstantofintegration.TheterminalconditionthatS(0)=1impliesthatK=1,andthesolutionisS(t)=e−λt.Thissolutionistheexponentialdistribution,whichhasbeenusedtomodelthetimeuntilfailureofelectroniccomponents.Estimationofλissimple,sincewithanexpo-nentialdistribution,E[t]=1/λ.Themaximumlikelihoodestimatorofλwouldbethereciprocalofthesamplemean.Anaturalextensionmightbetomodelthehazardrateasalinearfunction,λ(t)=α+βt.Then(t)=αt+1βt2andf(t)=λ(t)S(t)=λ(t)exp[−(t)].Toavoidanega-2tivehazardfunction,onemightdepartfromλ(t)=exp[g(t,θ)],whereθisavectorofparameterstobeestimated.Withanobservedsampleofdurations,estimationofαand\nGreene-50240bookJune28,200217:5794CHAPTER22✦LimitedDependentVariableandDurationModelsTABLE22.9SurvivalDistributionsDistributionHazardFunction,λ(t)SurvivalFunction,S(t)Exponentialλ,S(t)=e−λtWeibullλp(λt)p−1,S(t)=e−(λt)pLognormalf(t)=(p/t)φ[pln(λt)]S(t)=[−pln(λt)][lntisnormallydistributedwithmean−lnλandstandarddeviation1/p.]Loglogisticλ(t)=λp(λt)p−1/[1+(λt)p],S(t)=1/[1+(λt)p][lnthasalogisticdistributionwithmean−lnλandvarianceπ2/(3p2).]βis,atleastinprinciple,astraightforwardprobleminmaximumlikelihood.[Kennan(1985)usedasimilarapproach.]Adistributionwhosehazardfunctionslopesupwardissaidtohavepositivedurationdependence.Forsuchdistributions,thelikelihoodoffailureattimet,conditionalupondurationuptotimet,isincreasingint.Theoppositecaseisthatofdecreasinghazardornegativedurationdependence.Ourquestionintheintroductionaboutwhetherthestrikeismoreorlesslikelytoendattimetgiventhatithaslasteduntiltimetcanbeframedintermsofpositiveornegativedurationdependence.Theassumeddistributionhasaconsiderablebearingontheanswer.Ifoneisunsureattheoutsetoftheanalysiswhetherthedatacanbecharacterizedbypositiveornegativedurationdependence,thenitiscounterproductivetoassumeadistributionthatdisplaysonecharacteristicortheotherovertheentirerangeoft.Thus,theexponentialdistributionandoursug-gestedextensioncouldbeproblematic.Theliteraturecontainsacornucopiaofchoicesfordurationmodels:normal,inversenormal[inverseGaussian;seeLancaster(1990)],lognormal,F,gamma,Weibull(whichisapopularchoice),andmanyothers.37Toillustratethedifferences,wewillexamineafewofthesimplerones.Table22.9liststhehazardfunctionsandsurvivalfunctionsforfourcommonlyuseddistributions.Eachinvolvestwoparameters,alocationparameter,λandascaleparameter,p.[Notethatinthebenchmarkcaseoftheexponentialdistribution,λisthehazardfunction.Inallothercases,thehazardfunctionisafunctionofλ,pand,wherethereisdurationdependence,taswell.Differentauthors,e.g.,Kiefer(1988),usedifferentparameterizationsofthesemodels;WefollowtheconventionofKalbfleischandPrentice(1980).]Allthesearedistributionsforanonnegativerandomvariable.Theirhazardfunc-tionsdisplayverydifferentbehaviors,ascanbeseeninFigure22.4.Thehazardfunctionfortheexponentialdistributionisconstant,thatfortheWeibullismonotonicallyin-creasingordecreasingdependingonp,andthehazardsforlognormalandloglogisticdistributionsfirstincreaseandthendecrease.Whichamongtheseorthemanyalterna-tivesislikelytobebestinanyapplicationisuncertain.22.5.2.cMaximumLikelihoodEstimationTheparametersλandpofthesemodelscanbeestimatedbymaximumlikelihood.Forobserveddurationdata,t1,t2,...,tn,thelog-likelihoodfunctioncanbeformulatedandmaximizedinthewayswehavebecomefamiliarwithinearlierchapters.CensoredobservationscanbeincorporatedasinSection22.3forthetobitmodel.[See(22-13).]37ThreesourcesthatcontainnumerousspecificationsareKalbfleischandPrentice(1980),CoxandOakes(1985),andLancaster(1990).\nGreene-50240bookJune28,200217:5CHAPTER22✦LimitedDependentVariableandDurationModels795Hazardfunction0.040Lognormal0.032ExponentialLoglogistic0.024Weibull0.0160.0080020406080100DaysFIGURE22.4ParametricHazardFunctions.Assuch,lnL(θ)=lnf(t|θ)+lnS(t|θ),uncensoredcensoredobservationsobservationswhereθ=(λ,p).Forsomedistributions,itisconvenienttoformulatethelog-likelihoodfunctionintermsoff(t)=λ(t)S(t)sothatlnL=λ(t|θ)+lnS(t|θ).uncensoredallobservationsobservationsInferenceabouttheparameterscanbedoneintheusualway.EithertheBHHHestima-tororactualsecondderivativescanbeusedtoestimateasymptoticstandarderrorsfortheestimates.Thetransformationw=p(lnt+lnλ)forthesedistributionsgreatlyfacil-itatesmaximumlikelihoodestimation.Forexample,fortheWeibullmodel,bydefiningw=p(lnt+lnλ),weobtaintheverysimpledensityf(w)=exp[w−exp(w)]andsur-vivalfunctionS(w)=exp(−exp(w)).38Therefore,byusinglntinsteadoft,wegreatlysimplifythelog-likelihoodfunction.DetailsfortheseandseveralotherdistributionsmaybefoundinKalbfleischandPrentice(1980,pp.56–60).TheWeibulldistributionisexaminedindetailinthenextsection.38Thetransformationisexp(w)=(λt)psot=(1/λ)[exp(w)]1/p.TheJacobianofthetransformationisdt/dw=[exp(w)]1/p/(λp).ThedensityinTable22.9isλp[exp(w)]−(1/p)−1[exp(−exp(w))].MultiplyingbytheJacobianproducestheresult,f(w)=exp[w−exp(w)].Thesurvivalfunctionistheantiderivative,[exp(−exp(w))].\nGreene-50240bookJune28,200217:5796CHAPTER22✦LimitedDependentVariableandDurationModels22.5.2.dExogenousVariablesOnelimitationofthemodelsgivenaboveisthatexternalfactorsarenotgivenaroleinthesurvivaldistribution.Theadditionof“covariates”todurationmodelsisfairlystraightforward,althoughtheinterpretationofthecoefficientsinthemodelislessso.Consider,forexample,theWeibullmodel.(Theextensiontootherdistributionswillbesimilar.)Let−xβλi=ei,wherexiisaconstanttermandasetofvariablesthatareassumednottochangefromtimeT=0untilthe“failuretime,”T=ti.Makingλiafunctionofasetofregressorsisequivalenttochangingtheunitsofmeasurementonthetimeaxis.Forthisreason,thesemodelsaresometimescalledacceleratedfailuretimemodels.Noteaswellthatinallthemodelslisted(andgenerally),theregressorsdonotbearonthequestionofdurationdependence,whichisafunctionofp.Letσ=1/pandletδi=1ifthespelliscompletedandδi=0ifitiscensored.Asbefore,let(lnt−xβ)iiwi=pln(λiti)=σanddenotethedensityandsurvivalfunctionsf(wi)andS(wi).Theobservedrandomvariableislnt=σw+xβ.iiiTheJacobianofthetransformationfromwitolntiisdwi/dlnti=1/σsothedensityandsurvivalfunctionsforlntiare1lnt−xβlnt−xβiiiif(lnti|xi,β,σ)=fandS(lnti|xi,β,σ)=SσσσThelog-likelihoodfortheobserveddataisnlnL(β,σ|data)=[δilnf(lnti|xi,β,σ)+(1−δi)lnS(lnti|xi,β,σ)],i=1FortheWeibullmodel,forexample(seefootnote38)f(w)=exp(w−ewi)iiandS(w)=exp(−ewi).iMakingthetransformationtolntiandcollectingtermsreducesthelog-likelihoodtolnt−xβlnt−xβiiiilnL(β,σ|data)=δi−lnσ−exp.σσi(Manyotherdistributions,includingtheothersinTable22.9,simplifyinthesameway.Theexponentialmodelisobtainedbysettingσtoone.)ThederivativescanbeequatedtozerousingthemethodsdescribedinAppendixE.Theindividualtermscanalsobeused\nGreene-50240bookJune28,200217:5CHAPTER22✦LimitedDependentVariableandDurationModels797toformtheBHHHestimatoroftheasymptoticcovariancematrixfortheestimator.39TheHessianisalsosimpletoderive,soNewton’smethodcouldbeusedinstead.40Notethatthehazardfunctiongenerallydependsont,p,andx.Thesignofanestimatedcoefficientsuggeststhedirectionoftheeffectofthevariableonthehazardfunctionwhenthehazardismonotonic.Butinthosecases,suchastheloglogistic,inwhichthehazardisnonmonotonic,eventhismaybeambiguous.Themagnitudesoftheeffectsmayalsobedifficulttointerpretintermsofthehazardfunction.Inafewcases,wedogetaregression-likeinterpretation.IntheWeibullandexponentialmodels,E[t|x]=exp(xβ)[(1/p)+1],whereasforthelognormalandloglogisticmodels,iiE[lnt|x]=xβ.Inthesecases,βisthederivative(oramultipleofthederivative)iikofthisconditionalmean.Forsomeotherdistributions,theconditionalmedianoftiseasilyobtained.NumerouscasesarediscussedbyKiefer(1988),KalbfleischandPrentice(1980),andLancaster(1990).22.5.2.eHeterogeneityTheproblemofheterogeneityindurationmodelscanbeviewedessentiallyastheresultofanincompletespecification.Individualspecificcovariatesareintendedtoincorpo-rateobservationspecificeffects.Butifthemodelspecificationisincompleteandifsystematicindividualdifferencesinthedistributionremainaftertheobservedeffectsareaccountedfor,theninferencebasedontheimproperlyspecifiedmodelislikelytobeproblematic.Wehavealreadyencounteredseveralsettingsinwhichthepossibilityofheterogeneitymandatedachangeinthemodelspecification;thefixedandrandomeffectsregression,logit,andprobitmodelsallincorporateobservation-specificeffects.Indeed,allthefailuresofthelinearregressionmodeldiscussedintheprecedingchap-terscanbeinterpretedasaconsequenceofheterogeneityarisingfromanincompletespecification.Thereareanumberofwaysofextendingdurationmodelstoaccountforhet-erogeneity.ThestrictlynonparametricapproachoftheKaplan–Meierestimator(seeSection22.5.3)islargelyimmunetotheproblem,butitisalsoratherlimitedinhowmuchinformationcanbeculledfromit.Onedirectapproachistomodelheterogeneityintheparametricmodel.Supposethatwepositasurvivalfunctionconditionedontheindividualspecificeffectvi.WetreatthesurvivalfunctionasS(ti|vi).Thenaddtothatamodelfortheunobservedheterogeneityf(vi).(NotethatthisisacounterparttotheincorporationofadisturbanceinaregressionmodelandfollowsthesameproceduresthatweusedinthePoissonmodelwithrandomeffects.)ThenS(t)=Ev[S(t|v)]=S(t|v)f(v)dv.vThegammadistributionisfrequentlyusedforthispurpose.41Consider,forexample,usingthisdevicetoincorporateheterogeneityintotheWeibullmodelweusedearlier.Asistypical,weassumethatvhasagammadistributionwithmean1andvariance39Notethatthelog-likelihoodfunctionhasthesameformasthatforthetobitmodelinSection22.3.Byjustreinterpretingthenonlimitobservationsinatobitsetting,wecan,therefore,usethisframeworktoapplyawiderangeofdistributionstothetobitmodel.[SeeGreene(1995a)andreferencesgiventherein.]40SeeKalbfleischandPrentice(1980)fornumerousotherexamples.41See,forexample,Hausman,Hall,andGriliches(1984),whouseittoincorporateheterogeneityinthePoissonregressionmodel.TheapplicationisdevelopedinSection21.9.5.\nGreene-50240bookJune28,200217:5798CHAPTER22✦LimitedDependentVariableandDurationModelsθ=1/k.Thenkkf(v)=e−kvvk−1(k)and−(vλt)pS(t|v)=e.Afterabitofmanipulation,weobtaintheunconditionaldistribution,∞S(t)=S(t|v)f(v)dv=[1+θ(λt)p]−1/θ.0Thelimitingvalue,withθ=0,istheWeibullsurvivalmodel,soθ=0correspondstoVar[v]=0,ornoheterogeneity.42Thehazardfunctionforthismodelisλ(t)=λp(λt)p−1[S(t)]θ,whichshowstherelationshiptotheWeibullmodel.Thisapproachiscommoninparametricmodelingofheterogeneity.Inanimportantpaperonthissubject,HeckmanandSinger(1984b)arguedthatthisapproachtendstooverparameterizethesurvivaldistributionandcanleadtoratherseriouserrorsininference.Theygavesomedramaticexamplestomakethepoint.Theyalsoexpressedsomeconcernthatresearcherstendtochoosethedistributionofheterogeneitymoreonthebasisofmathematicalconveniencethanonanysensibleeconomicbasis.22.5.3OTHERAPPROACHESTheparametricmodelsareattractivefortheirsimplicity.Butbyimposingasmuchstructureonthedataastheydo,themodelsmaydistorttheestimatedhazardrates.Itmaybethatamoreaccuraterepresentationcanbeobtainedbyimposingfewerrestrictions.TheKaplan–Meier(1958)productlimitestimatorisastrictlyempirical,nonpara-metricapproachtosurvivalandhazardfunctionestimation.Assumethattheobser-vationsondurationaresortedinascendingordersothatt1≤t2andsoonand,fornow,thatnoobservationsarecensored.SupposeaswellthatthereareKdistinctsur-vivaltimesinthedata,denotedTk;Kwillequalnunlessthereareties.LetnkdenotethenumberofindividualswhoseobserveddurationisatleastTk.ThesetofindividualswhosedurationisatleastTkiscalledtherisksetatthisduration.(Weborrow,onceagain,frombiostatistics,wheretherisksetisthoseindividualsstill“atrisk”attimeTk).Thus,nkisthesizeoftherisksetattimeTk.LethkdenotethenumberofobservedspellscompletedattimeTk.Astrictlyempiricalestimateofthesurvivorfunctionwouldbekn−hn−hSˆ(Tiiiik)==.nin1i=142Forthestrikedataanalyzedearlier,themaximumlikelihoodestimateofθis0.0004,whichsuggeststhatatleastinthecontextoftheWeibullmodel,heterogeneitydoesnotappeartobeaproblem.\nGreene-50240bookJune28,200217:5CHAPTER22✦LimitedDependentVariableandDurationModels799Theestimatorofthehazardrateishkλ(ˆTk)=.(22-24)nkCorrectionsarenecessaryforobservationsthatarecensored.Lawless(1982),KalbfleischandPrentice(1980),Kiefer(1988),andGreene(1995a)givedetails.Susin(2001)pointsoutafundamentalambiguityinthiscalculation(onewhichhearguesap-pearsinthe1958source).Theestimatorin(22-24)isnota“rate”assuch,asthewidthofthetimewindowisundefined,andcouldbeverydifferentatdifferentpointsinthechainofcalculations.Sincemanyintervals,particularlythoselateintheobservationperiod,mighthavezeros,thefailuretoacknowledgetheseintervalsshouldimpartanupwardbiastotheestimator.Hisproposedalternativecomputesthecounterpartto(22-24)overameshofdefinedintervalsasfollows:bλˆbj=ahjIa=bj=anjbjwheretheintervalisfromt=atot=b,hjisthenumberoffailuresineachperiodinthisinterval,njisthenumberofindividualsatriskinthatperiodandbjisthewidthoftheperiod.Thus,aninterval[a,b)islikelytoincludeseveral“periods.”Cox’s(1972)approachtotheproportionalhazardmodelisanotherpopular,semi-parametricmethodofanalyzingtheeffectofcovariatesonthehazardrate.Themodelspecifiesthatλ(t)=exp(−xβ)λ(t)ii0iThefunctionλ0isthe“baseline”hazard,whichistheindividualheterogeneity.Inprin-ciple,thishazardisaparameterforeachobservationthatmustbeestimated.Cox’spartiallikelihoodestimatorprovidesamethodofestimatingβwithoutrequiringesti-mationofλ0.TheestimatorissomewhatsimilartoChamberlain’sestimatorforthelogitmodelwithpaneldatainthataconditioningoperationisusedtoremovethehetero-geneity.(SeeSection21.5.1.b.)SupposethatthesamplecontainsKdistinctexittimes,T1,...,TK.ForanytimeTk,theriskset,denotedRk,isallindividualswhoseexittimeisatleastTk.TherisksetisdefinedwithrespecttoanymomentintimeTasthesetofindividualswhohavenotyetexitedjustpriortothattime.ForeveryindividualiinrisksetRk,ti≥Tk.TheprobabilitythatanindividualexitsattimeTkgiventhatexactlyoneindividualexitsatthistime(whichisthecounterparttotheconditioninginthebinarylogitmodelinChapter21)isβxeiProb[ti=Tk|risksetk]=.eβxjj∈RkThus,theconditioningsweepsoutthebaselinehazardfunctions.Forthesimplestcaseinwhichexactlyoneindividualexitsateachdistinctexittimeandtherearenocensoredobservations,thepartiallog-likelihoodisKβxlnL=βx−lnej.kk=1j∈Rk\nGreene-50240bookJune28,200217:5800CHAPTER22✦LimitedDependentVariableandDurationModelsTABLE22.10EstimatedDurationModels(EstimatedStandardErrorsinParentheses)λpMedianDurationExponential0.02344(0.00298)1.00000(0.00000)29.571(3.522)Weibull0.02439(0.00354)0.92083(0.11086)27.543(3.997)Loglogistic0.04153(0.00707)1.33148(0.17201)24.079(4.102)Lognormal0.04514(0.00806)0.77206(0.08865)22.152(3.954)IfmkindividualsexitattimeTk,thenthecontributiontothelog-likelihoodisthesumofthetermsforeachoftheseindividuals.TheproportionalhazardmodelisacommonchoiceformodelingdurationsbecauseitisareasonablecompromisebetweentheKaplan–Meierestimatorandthepossiblyexcessivelystructuredparametricmodels.HausmanandHan(1990)andMeyer(1988),amongothers,havedevisedother,“semiparametric”specificationsforhazardmodels.Example22.10SurvivalModelsforStrikeDurationThestrikedurationdatagiveninKennan(1985,pp.14–16)havebecomeafamiliarstandardforthedemonstrationofhazardmodels.AppendixTableF22.1liststhedurationsindaysof62strikesthatcommencedinJuneoftheyears1968to1976.Eachinvolvedatleast1,000workersandbeganattheexpirationorreopeningofacontract.Kennanreportedtheactualduration.Inhissurvey,Kiefer,usingthesameobservations,censoredthedataat80daystodemonstratetheeffectsofcensoring.Wehavekeptthedataintheiroriginalform;theinterestedreaderisreferredtoKieferforfurtheranalysisofthecensoringproblem.43ParameterestimatesforthefourdurationmodelsaregiveninTable22.10.TheestimateofthemedianofthesurvivaldistributionisobtainedbysolvingtheequationS(t)=0.5.Forexample,fortheWeibullmodel,PS(M)=0.5=exp[−(λM)]or1/pM=[(ln2)]/λ.Fortheexponentialmodel,p=1.Forthelognormalandloglogisticmodels,M=1/λ.Thedeltamethodisthenusedtoestimatethestandarderrorofthisfunctionoftheparameterestimates.(SeeSection5.2.4.)Allthesedistributionsareskewedtotheright.Assuch,E[t]isgreaterthanthemedian.FortheexponentialandWeibullmodels,E[t]=[1/λ][(1/p)+1];forthenormal,E[t]=(1/λ)[exp(1/p2)]1/2.TheimpliedhazardfunctionsareshowninFigure22.4.Thevariablexreportedwiththestrikedurationdataisameasureofunanticipatedag-gregateindustrialproductionnetofseasonalandtrendcomponents.Itiscomputedastheresidualinaregressionofthelogofindustrialproductioninmanufacturingontime,timesquared,andmonthlydummyvariables.Withtheindustrialproductionvariableincludedasacovariate,theestimatedWeibullmodelis−lnλ=3.7772−9.3515x,p=1.00288(0.1394)(2.973)(0.1217),medianstrikelength=27.35(3.667)days,E[t]=39.83days.NotethattheWeibullmodelisnowalmostidenticaltotheexponentialmodel(p=1).Sincethehazardconditionedonxisapproximatelyequaltoλi,itfollowsthatthehazardfunctionisincreasingin“unexpected”industrialproduction.Aonepercentincreaseinxleadstoa9.35percentincreaseinλ,whichsincep≈1translatesintoa9.35percentdecreaseinthemedianstrikelengthorabout2.6days.(NotethatM=ln2/λ.)43OurstatisticalresultsarenearlythesameasKiefer’sdespitethecensoring.\nGreene-50240bookJune28,200217:5CHAPTER22✦LimitedDependentVariableandDurationModels801Theproportionalhazardmodeldoesnothaveaconstantterm.(Thebaselinehazardisanindividualspecificconstant.)Theestimateofβis−9.0726,withanestimatedstandarderrorof3.225.ThisisverysimilartotheestimateobtainedfortheWeibullmodel.22.6SUMMARYANDCONCLUSIONSThischapterhasexaminedthreesettingsinwhich,inprinciple,thelinearregressionmodelofChapter2wouldapply,butthedatageneratingmechanismproducesanonlin-earform.Inthetruncatedregressionmodel,therangeofthedependentvariableisre-strictedsubstantively.Certainlyalleconomicdataarerestrictedinthisway—aggregateincomedatacannotbenegative,forexample.But,whendataaretruncatedsothatplau-siblevaluesofthedependentvariableareprecluded,forexamplewhenzerovaluesforexpenditurearediscarded,thedatathatremainareanalyzedwithmodelsthatexplicitlyaccountforthetruncation.Whendataarecensored,valuesofthedependentvariablethatcouldinprinciplebeobservedaremasked.Rangesofvaluesofthetruevariablebeingstudiedareobservedasasinglevalue.Thebasicproblemthispresentsformodelbuildingisthatinsuchacase,weobservevariationoftheindependentvariableswithoutthecorrespondingvariationinthedependentvariablethatmightbeexpected.Finally,theissueofsampleselectionariseswhentheobserveddataarenotdrawnrandomlyfromthepopulationofinterest.Failuretoaccountforthisnonrandomsamplingpro-ducesamodelthatdescribesonlythenonrandomsubsample,notthelargerpopulation.Ineachcase,weexaminedthemodelspecificationandestimationtechniqueswhichareappropriateforthesevariationsoftheregressionmodel.Maximumlikelihoodisusuallythemethodofchoice,butforthethirdcase,atwostepestimatorhasbecomemorecommon.Inthefinalsection,weexaminedanapplication,modelsofduration,whichdescribevariableswithlimited(nonnegative)rangesofvariationandwhichareoftenobservedsubjecttocensoring.KeyTermsandConcepts•Acceleratedfailuretime•Incidentaltruncation•Semiparametricmodel•Attenuation•Integratedhazardfunction•Specificationerror•Censoredregression•InverseMillsratio•Survivalfunction•Censoredvariable•Lagrangemultipliertest•Timevaryingcovariate•Censoring•Marginaleffects•Tobitmodel•Conditionalmomenttest•Negativeduration•Treatmenteffect•Countdatadependence•Truncatedbivariatenormal•Degreeoftruncation•Olsen’sreparameterizationdistribution•Deltamethod•Parametricmodel•Truncateddistribution•Durationdependence•Partiallikelihood•Truncatedmean•Durationmodel•Positiveduration•Truncatedrandomvariable•Generalizedresidualdependence•Truncatedvariance•Hazardfunction•Productlimit•Twostepestimation•Hazardrate•Proportionalhazard•Weibullmodel•Heterogeneity•Riskset•Heteroscedasticity•Sampleselection\nGreene-50240bookJune28,200217:5802CHAPTER22✦LimitedDependentVariableandDurationModelsExercises1.Thefollowing20observationsaredrawnfromacensorednormaldistribution:3.83967.20400.000000.000004.41328.02305.79717.08280.000000.8026013.06704.32110.000008.68015.45710.000008.10210.000001.25265.6016Theapplicablemodelisy∗=µ+ε,iiy=y∗ifµ+ε>0,0otherwise,iiiε∼N[0,σ2].iExercises1through4inthissectionarebasedontheprecedinginformation.TheOLSestimatorofµinthecontextofthistobitmodelissimplythesamplemean.Computethemeanofall20observations.Wouldyouexpectthisestimatortoover-orunderestimateµ?Ifweconsideronlythenonzeroobservations,thenthetrun-catedregressionmodelapplies.Thesamplemeanofthenonlimitobservationsistheleastsquaresestimatorinthiscontext.Computeitandthencommentonwhetherthissamplemeanshouldbeanoverestimateoranunderestimateofthetruemean.2.Wenowconsiderthetobitmodelthatappliestothefulldataset.a.Formulatethelog-likelihoodforthisverysimpletobitmodel.b.Reformulatethelog-likelihoodintermsofθ=1/σandγ=µ/σ.Thenderivethenecessaryconditionsformaximizingthelog-likelihoodwithrespecttoθandγ.c.DiscusshowyouwouldobtainthevaluesofθandγtosolvetheprobleminPartb.d.Computethemaximumlikelihoodestimatesofµandσ.3.Usingonlythenonlimitobservations,repeatExercise2inthecontextofthetrun-catedregressionmodel.Estimateµandσbyusingthemethodofmomentsesti-matoroutlinedinExample22.2.Compareyourresultswiththoseinthepreviousexercises.4.ContinuingtousethedatainExercise1,consideronceagainonlythenonzeroobservations.Supposethatthesamplingmechanismisasfollows:y∗andanothernormallydistributedrandomvariablezhavepopulationcorrelation0.7.Thetwovariables,y∗andz,aresampledjointly.Whenzisgreaterthanzero,yisreported.Whenzislessthanzero,bothzandyarediscarded.Exactly35drawswererequiredtoobtaintheprecedingsample.Estimateµandσ.[Hint:UseTheorem22.5.]5.DerivethemarginaleffectsforthetobitmodelwithheteroscedasticitythatisdescribedinSection22.3.4.a.6.ProvethattheHessianforthetobitmodelin(22-14)isnegativedefiniteafterOlsen’stransformationisappliedtotheparameters.