___  ____  ____  ____  ____(R)
                                                      /__    /   ____/   /   ____/   
                                                     ___/   /   /___/   /   /___/    
                                                       Statistics/Data Analysis      
      
      help ardl
      -------------------------------------------------------------------------------
      
      Title
      
          ardl -- Autoregressive distributed lag regression model
      
      
      Syntax
      
              ardl depvar [indepvars] [if] [in] [, options]       (Syntax 1, for
                       estimation)
      
              ardl , { fbounds(casenum) | tbounds(casenum) }      (Syntax 2, for
                       displaying critical value tables)
      
          options                  Description
          -------------------------------------------------------------------------
          Model
            lags(numlist)          set lag lengths
            maxlags(numlist)       set maximum lag lengths
            minlag1                require at least one lag for indepvars
            maxcombs(numlist)      set maximum number of lag permutations for lag
                                     selection
            ec                     estimate with depvar in first differences and
                                     display output in error-correction form
            aic                    use AIC as information criterion
            bic                    use BIC as information criterion; default
            exog(exogvars)         exogenous variables in regression
            noconstant             suppress constant term
            trendvar(trendvarname) specify trend variable
            restricted             restrict constant or trend term (see
                                     Deterministic components)
            regstore(storename)    stores estimation results from underlying
                                     regress command as storename
            perfect                do not check for collinearity
      
          Reporting (Syntax I)
            noctable               do not display coefficient table
            btest                  display Pesaran/Shin/Smith (2001) bounds test
            display_options        control column formats, row spacing, line width,
                                     and display of omitted variables and base and
                                     empty cells
      
          Reporting (Syntax II)
            fbounds(casenum)       display Pesaran/Shin/Smith (2001) critical
                                     values for the F-statistics for case casenum.
                                     For more information on casenum, see
                                     Deterministic components
            tbounds(casenum)       works in analogy to option fbounds, but refers
                                     to the t-statistic
          -------------------------------------------------------------------------
      
          You must tsset your data before using ardl; see [TS] tsset.
          by is allowed; see [D] by.
          depvar and indepvars may NOT contain time-series operators.
      
      
      Description
      
          ardl fits a linear regression model of depvar on indepvars with lagged
          depvar and indepvars as additional regressors.  Information criteria are
          used to find the optimal lag lengths, if those are not pre-specified as
          an option.  Estimation output is delivered either in levels-form or in
          error-correction-form.  As an option, it displays results from the
          Pesaran/Shin/Smith (2001) bounds testing procedure for the existence of a
          levels-relationship.
      
          In syntax II, ardl is a convenience tool to display entire tables of
          critical values from the Pesaran/Shin/Smith (2001) bounds test.
      
      
      Abbreviations and definitions used in this help entry
      
          Abbreviations: 
                         
          ARDL:          auto-regressive distributed lag
          PSS:           Pesaran/Shin/Smith (2001)
          VECM:          vector error-correction model
      
      
      Options
      
              +-------+
          ----+ Model +------------------------------------------------------------
      
          lags(numlist) specifies the number of lags for some or all regressors.
              The first number specifies the lag length for depvar that has to be
              larger than 0; the following numbers specify the lag lengths for the
              independent variables in the order they appear in indepvars.  0 is
              possible for the long-run regressor variables if option minlag1 is
              not used.  0 is never possible for the lag order of the dependent
              variable.  Missing values indicate lags that are not pre-specified.
              Information criteria are used to determine them.  For example, lags(.
              . 4) requires the second independent variable to enter with 4 lags
              while the lags of the dependent variable and the first independent
              variable are to be determined by an information criterion.
      
              The number of elements in numlist (positive integers or dots) must be
              equal to the number of variables specified in the command line
              (depvar + indepvars).  Alternatively, numlist may only contain one
              element, in which case this number applies to all variables in depvar
              and indepvars.
      
          maxlags(numlist) specifies the maximum lag order used for optimal lag
              selection.  The first number specifies the maximum lag length for 
              depvar that has to be larger than 0; the following numbers specify
              the maximum lag lengths for the independent variables in the order
              they appear in indepvars.  The default maximum lag order is 4.
      
              Since maxlags only deals with optimal lag order selection, values for
              all or some of its elements are ignored if lags indicates
              pre-specified lags for some or all variables.
      
              The number of elements in numlist (positive integers or dots) must be
              equal to the number of variables specified in the command line
              (depvar + indepvars).  Alternatively, numlist may only contain one
              element, in which case this number applies to all variables in depvar
              and indepvars.
      
          maxcombs(combnum) specifies the maximum number of lag permutations
              allowed for the optimal lag selection.  If the number of lag
              permutations required to find the optimal lag lengths exceeds
              combnum, ardl errors out.  The default for combnum is 500.  You can
              set combnum to higher values.  However, combnum may not exceed the
              value of your current matsize setting.
      
          ec will estimate the model in 'first-difference' form (see below) and
              display the output in error-correction form.
      
          minlag1 will only consider models where indepvars have at least one lag,
              i.e. the optimal lag selection iterations will skip models where one
              or more of these variables have a lag length of zero.  An implication
              of this is that you may not use option minlag1 in conjunction with a
              lag specification in option lags that sets the lag order of any
              variable to zero.
      
              If in addition option ec is specified, the error-correction output of
              the long-run regressors (other than the dependent variable) are
              expressed in terms of time t-1.  The default is to write them in
              terms of time t.  The two parameterizations will yield identical
              estimates, with the exception of the first first-difference term of
              each long-run regressor.
      
          aic is used to determine the optimal lag lengths with the Akaike
              information criterion.
      
          bic is used to determine the optimal lag lengths with the Bayesian
              information criterion, which is the default.
      
          exog(exogvars) specifies additional variables to be tagged on to the
              regression.
      
          noconstant suppresses the constant term in the model.
      
          trendvar(trendvarname) lets you add a trend term to your model.
              trendvarname must exist in the data set before execution of ardl and
              it must be collinear with timevar, where timevar is the time variable
              set by tsset.  A convenient shortcut that skips the creation of a
              separate time trend variable is to use trendvar(timevar).
      
          restricted will restrict either the constant term or the time trend, if
              any of the two are specified.  See Deterministic components below.
      
              If no deterministics are in the model, restricted will cause an
              error.
      
          regstore(storename) will store the estimation results from the underlying
              regress command.  This is useful if you want to perform
              postestimation operations like predicting residuals, regression
              diagnostics, and so forth.  See estimates and regress postestimation.
      
          Note that if a estimation results set called storename already exists,
              option regstore will overwrite it without warning.
      
          perfect omits the check for collinearity among the regressors.
      
              +----------------------+
          ----+ Reporting (Syntax I) +---------------------------------------------
      
          noctable suppresses the display of the coefficient table.  Useful if the
              intention is to look at the PSS bounds test only.
      
          btable displays the F- and t-statistics in relation to the long-run
              relationship, and displays critical values for these statistics
              tabulated in PSS.
      
          display_options:  noomitted, vsquish, noemptycells, baselevels,
              allbaselevels, cformat(%fmt), pformat(%fmt), sformat(%fmt), and
              nolstretch; see [R] estimation options.
      
              +-----------------------+
          ----+ Reporting (Syntax II) +--------------------------------------------
      
          fbounds(casenum) displays an entire table of critical values for the
              F-statistic that can be used for testing for the existence of a
              long-run relationship, according to casenum, where casenum pins down
              the deterministic terms in the model, and is an integer from 1 to 5.
              See Deterministic components for more information on the meaning of
              casenum.
      
          tbounds(casenum) works in analogy to fbounds but concerns the
              t-statistic.  Here casenum must be one of 1, 3, or 5.
      
      
      Remarks
      
          Remarks are presented under the following headings:
      
          Introduction
          Terminology
          Lag specification
          Deterministic components
          Bounds test for a level relationship
          Pure autoregressive processes
          Postestimation
          Replay
      
      Introduction
      
          A autoregressive distributed lag (ARDL) model of order p and q, denoted
          ARDL(p,q) regresses the dependent variable on p of its own lags and on q
          lags of one or more additional regressors.  Multiple regressors are
          allowed to have different lag orders, in which case the model becomes an
          ARDL(p, q_1, ..., q_k) model, where k is the number of non-deterministic
          regressors.  ARDL models can, among other things, be used for the
          estimation and testing of cointegration relationships.  Key contributions
          in this area are Pesaran and Shin (1999) and Pesaran, Shin and Smith
          (1999).  For a succinct exposition of ARDL models in the context of
          cointegration, see Hassler and Wolters (2005).
      
      Terminology
      
          The regression equation in the sense of the preceeding paragraph is
          referred to as the levels-equation.  This equation can be rewritten such
          that the differenced depvar is expressed in terms of the lagged depvar,
          levels of indepvars, and differenced terms of (depvar, indepvars) up to
          orders (p-1, q_1-1, ...,q_k-1).  This way of writing the ARDL model is
          referred to here as the first-difference form or equation, although this
          is a slight abuse of terminology since it is a mere reparameterization of
          the levels-equation.  Dividing the coefficient for the levels regressors
          by the coefficient of the lagged depvar and appropriately accounting for
          model deterministics then yields the error-correction-form.  It separates
          the adjustment coefficient to deviations from long-run equilibrium,
          long-run coefficients, and short-run coefficients.
      
          ardl without option ec will run a regression of the levels-equation, save
          the dependent variable and the regressors in the macros e(depvar) and
          e(regressors) and in the matrix e(b), and display a corresponding table
          of estimates.  If option ec is used, ardl will run a regression
          corresponding to the first-difference equation and save the dependent
          variable and the regressors in the macros e(depvar) and e(regressors).
          The coefficient output table and e(b) will be in terms of the
          error-correction form.
      
      Lag specification
      
          Lags specified in options lags and maxlags refer to lags in the levels
          equation, whether option ec is used or not.  For example, if you use
          lags(2 4 4), the dependent variable will have two lags in the levels
          regression and the two independent variables will have four lags in the
          levels regression.  The lag length of the first differences in the
          first-difference equation will be one less for each variable.
      
          In a similar fashion, any lag information saved in e() will refer to the
          levels equation.
      
      Deterministic components
      
          In the vector error-correction model (VECM) literature, it is common to
          distinguish five different cases of model deterministics:
      
             casenum   description
             ------------------------------------------------------------------------
             1         no constant, no trend
             2         restricted constant, no trend
             3         unrestricted constant, no trend
             4         unrestricted constant, restricted trend
             5         unrestricted constant, unrestricted trend
      
          Rewriting the levels-equation in first-difference form yields
          restrictions on the constant term and the linear trend.  Cases 2 and 4
          impose the implied restriction.  If these restrictions are ignored (cases
          3 and 5), a constant term in the first-difference equation can generate a
          linear trend in the levels equation.  Likewise, an unrestricted trend in
          the first-difference equation can generate a quadratic trend in the
          levels-equation.  For a more detailed exposition, see for example
          Lütkepohl (2005), section 6.4, or [TS] vec.
      
          Stata's vec command, which estimates VECMs, distinguishes between the
          same five cases through its trend option.  The following table provides a
          mapping between case numbers and vec syntax.
      
             casenum   vec syntax
             ------------------------------------------------------------------------
             1         trend(none)
             2         trend(rconstant)
             3         trend(constant)
             4         trend(rtrend)
             5         trend(trend)
      
          The ardl syntax for determining the casenum is different from vec but
          close to standard Stata syntax for linear regressions.  A constant term
          can be omitted by using option noconstant.  To include a time trend,
          generate a separate trend variable and include it in option trendvar.  If
          you want to have a linear time trend and your time series variable is
          named timevar, you can simply use trendvar(timevar).  The table below
          provides a mapping between case numbers and ardl options.  Note that a
          constant is included in the model by default which is why the option
          constant below is in brackets.  It is redundant to specify this option
          explicitly.
      
             casenum   ardl options
             ------------------------------------------------------------------------
             1         noconstant
             2         [constant] restricted
             3         [constant]
             4         [constant] trendvar(trendvarname) restricted
             5         [constant] trendvar(trendvarname)
      
          Whereas the specification of deterministics has considerable implications
          for the estimation procedures of VECMs, this is not so for ARDL models.
          In the conditional ARDL modelling approach proposed by PSS, for example,
          cases 2 and 3 and cases 4 and 5 are based on identical linear regressions
          of the first-difference equation.  The distinction within each case-pair
          concerns the interpretation of the deterministic terms, i.e. whether they
          are considered to be part of the long-run relationship or not.
          Accordingly, the asymptotic distribution for the test for a
          levels-relationship advanced in PSS is different for each case.
      
      Bounds test for a level relationship
      
          ardl implements the bounds test for a levels relationship proposed by
          PSS.  When option btest is used, the F-statistic and the t-statistic,
          which are dependent on casenum, are displayed along with critical values
          of the associated non-standard distributions provided by PSS.
      
          You must use option ec in your ardl model for bounds test-related
          statistics to be available.
      
          For cases 1, 3, and 5 only the F-statistic is calculated.
      
          To avoid pretesting problems, PSS suggest to apply the bounds test only
          to ARDL models without restrictions on the short-run coefficients (i.e.
          with a sufficiently high and common lag order for the regressors).
          However, ardl saves the F-statistic and t-statistic as well as the
          relevant critical values in e(), independently of the model
          specification.  Moreover, this information is saved in e() in all cases,
          regardless of whether the btest option is used or not.  The latter option
          only concerns the display of test results in the Stata results window.
      
      Pure autoregressive processes
      
          You may omit the specification of indepvars, in which case the process
          reduces to a pure autoregressive one.  Consequently, you can use ardl for
          the optimal lag selection of pure autoregressive processes.  See varsoc
          for an alternative way of doing this.
      
      Postestimation
      
          The standard subcommands of estat (i.e. estat summarize, estat vce, estat
          ic) work as usual.
      
          You can use option regstore to get predicted values, residuals, and other
          results.  This option stores the estimation results from Stata's regress
          which underlies ardl in Stata's estimation results catalogue (see 
          estimates).  After estimation using ardl, you can use estimates restore
          to recover results from regress, and then use the many tools of regress
          postestimation to perform the desired calculations.  It is recommended
          that you store the ardl results before restoring regress results, so you
          can easily switch back.
      
      Replay
      
          Replay of estimation results works as usual:  Type ardl, a comma, and
          then any of the reporting options for Syntax 1.
      
      
      Examples
      
          We use Stata's example data set 'lutkepohl2' that contains quarterly data
          for German aggregate income, investment, and consumption. We estimate an
          ARDL model in levels-form using the optimal number of lags according to
          BIC.
      
              . webuse lutkepohl2
              . ardl ln_inv ln_inc ln_consump, lags(. . 4) maxlag(3 3 3)
      
              Lags for ln_inv and ln_inc are optimally selected.  ln_consump is
                  pre-specified to receive a lag order of 4.  Here the maxlag
                  setting of 3 is ignored.  We can display the lags selected by:
              . matrix list e(lags)
      
              To estimate the error-correction coefficients, use option ec.  We use
                  option regstore also so we can generate predicted values later.
              . ardl ln_inv ln_inc ln_consump, ec regstore(lutreg)
      
              Predicted values are generated by restoring the regress result:
              . estimates store lutardl
              . estimates restore lutreg
      
              We can look at the regress results:
              . regress
      
              . predict yhat if e(sample), xb
              . estimates restore lutardl
      
              Since we have used option ec in the ardl estimation, the predicted
                  values refer to the first difference of ln_inv, not to the level:
              . tsline yhat d.ln_inv
      
      
          To give an example which is more meaningful from an economic perspective,
          we now want to examine a potential levels relationship between
          consumption and income.  The unrestricted constant in the model below is
          capable of generating the upward drift in the variables that is visible
          from their time-series graphs.
      
              . ardl ln_consump ln_inc, lags(4) ec
      
              The long-run coefficient on income is close to 1 and has a tight
                  confidence intervall.  To check whether a long-run relationship
                  between consumption and income can be statistically confirmed, we
                  replay the estimation output with the noctable and btest options,
                  which displays results from the PSS bounds test.
      
              . ardl, noctable btest
      
              The output shows that we cannot confirm the existence of a
                  levels-relationship.  Neither the F-statistic nor the t-statistic
                  reject the null hypothesis of no levels-relationship.
      
      
      Saved results
      
          ardl saves the following in e():
      
          Scalars        
            e(N)                number of observations
            e(df_m)             model degrees of freedom
            e(df_r)             residual degrees of freedom
            e(mss)              model sum of squares
            e(rss)              residual sum of squares
            e(rmse)             root mean squared error
            e(r2)               R-squared
            e(r2_a)             adjusted R-squared
            e(ll)               log likelihood under additional assumption of
                                  i.i.d. normal errors
            e(N_gaps)           number of gaps in sample (note: not number of
                                  missings)
            e(tmin)             first time period in sample
            e(tmax)             last time period in sample
            e(rank)             rank of e(V)
                                
            if option ec was used:
            F_pss               F-statistic, calculated according to casenum
            t_pss               t-statistic, calculated according to casenum
            case                casenum for model deterministics
      
          Macros         
            e(cmd)              ardl
            e(cmdline)          command as typed
            e(model)            level or ec
            e(title)            title in estimation output
            e(depvar)           name of dependent variable
            e(regressors)       full set of regressors in the ARDL model, as
                                  estimated by regress
            e(tsfmt)            format for the current time variable
            e(properties)       b V
                                
            if option ec was used:
            lrxvars             non-deterministic regressors in the long-run
                                  relationship
            lrdet               deterministic term in the long-run relationship
            srvars              short-run (differenced) regressors
            exogvars            exogenous variables
            det                 deterministic terms in the model, but not in the
                                  long-run relationship
      
          Matrices       
            e(b)                coefficient vector of the linear regression model
            e(V)                variance-covariance matrix of the estimators in the
                                  linear regression model
            e(lagcombs)         combinations of lags across which lag selection has
                                  searched; includes the information criterion for
                                  each lag specification
            e(maxlags)          vector with maximum lag lengths of depvar and 
                                  indepvars in the levels representation used for
                                  optimal lag selection
            e(lags)             vector with number of lags of depvar and indepvars
                                  in the levels representation
                                
            if option ec was used:
            F_critval           critical values, F-statistic, PSS bounds test for
                                  casenum
            t_critval           critical values, t-statistic, PSS bounds test for
                                  casenum
      
          Functions      
            e(sample)           marks estimation sample
      
      
      Authors
      
          Original Author: Sebastian Kripfganz, Goethe University Frankfurt,
          kripfganz@wiwi.uni-frankfurt.de
      
          Code modified by Daniel Schneider, Goethe University Frankfurt,
          schneider_daniel@hotmail.com
      
      
      References
      
          Hassler, U. and J. Wolters (2005): Autoregressive Distributed Lag Models
              and Cointegration.  Freie Universität Berlin, Working Paper
              No.2005/22.
      
          Lütkepohl, H. (2005): New Introduction to Multiple Time Series Analysis.
              Berlin, Heidelberg: Springer Verlag.
      
          Pesaran, M.H. and Y. Shin (1999): An Autoregressive Distributed Lag
              Modelling Approach to Cointegration Analysis.  In: Strom, S. (Ed.):
              Econometrics and Economic Theory in the 20th Century: The Ragnar
              Frisch Centennial Symposium.  Cambridge, UK: Cambridge University
              Press.
      
          Pesaran, M.H., Shin, Y. and R.J. Smith (2001): Bounds Testing Approaches
              to the Analysis of Level Relationships.  Journal of Applied
              Econometrics, 16 (3), 289-326.
      
      
      Also see