Elizabeth Warren Proposes "Wealth Tax" - Page 7 - Politics Forum.org | PoFo

Wandering the information superhighway, he came upon the last refuge of civilization, PoFo, the only forum on the internet ...

Traditional 'common sense' values and duty to the state.
Forum rules: No one line posts please.
#15164782
wat0n wrote:Well, in the 2003 paper the pre-treatment period is 20 years and they extrapolate up to year 22. It's not that crazy, to be honest - and here I would say it's actually relevant since it's hard to tell what's going on at the end in the France vs synthetic comparison.


Yeah but the 2003 paper looks at a single country with regions as donors and a fairly impactful event, i.e. terrorism.

In fact all 3 Abadie papers look at impactful events in their context, Basque terrorism, German reunification, Proposition 99. Unlike that wealth tax in France.

wat0n wrote:You could then simply remove the predictor variables were France is an outlier (or close to be one) in the pre- or -post-treatment periods (or both, but if an important selected variable was an outlier in the pre-treatment period you should see the synthetic France having a low pre-treatment fit), if that's a pressing concern - you can even automatize this process. But even then, I'd only do it after looking at what was selected without doing anything, at least the R packages will also tell you the variable weights. I haven't read that 2014 paper but I'm not sure it's wise to just use the same donor countries and variables for Germany on France without doing some analysis.

I also don't think Mexico could be easily regarded as being similar to France before 1982 (or 1994 for that matter).


Using the same donor pool for France and Germany seems perfectly reasonable. They could have looked at structural shocks in the sample period like Abadie does. The inclusion of Mexico is curious though, especially since they exclude Turkey and Chile for being too dissimilar in income and wealth.

wat0n wrote:Why not? You could make the case the owed future SS payments are part of your assets... And so it would undermine one of the arguments for a wealth tax (the controversial claim that wealth inequality has actually increased over time).


You mean pension funds?
#15164788
Rugoz wrote:Yeah but the 2003 paper looks at a single country with regions as donors and a fairly impactful event, i.e. terrorism.

In fact all 3 Abadie papers look at impactful events in their context, Basque terrorism, German reunification, Proposition 99. Unlike that wealth tax in France.


Why wouldn't a tax reform be an impactful event in its own context?

Rugoz wrote:Using the same donor pool for France and Germany seems perfectly reasonable. They could have looked at structural shocks in the sample period like Abadie does. The inclusion of Mexico is curious though, especially since they exclude Turkey and Chile for being too dissimilar in income and wealth.


Seems, you said it. It is possible not to rely on what seems to be reasonable, but check if it actually is. It's one of the strengths of synth controls.

It's possible Chile and Turkey were part of the donor pool but the optimizer weighted them at 0. We don't really know - that's my point!

Rugoz wrote:You mean pension funds?


Of the US social security? Yes, the payments are actually progressive (the replacement rate is decreasing on income IIRC).
#15164865
wat0n wrote:Why wouldn't a tax reform be an impactful event in its own context?


When looking at the effect on GDP ("the context") it obviously doesn't compare to an event like the German reunficiaton.

wat0n wrote:Seems, you said it. It is possible not to rely on what seems to be reasonable, but check if it actually is. It's one of the strengths of synth controls.

It's possible Chile and Turkey were part of the donor pool but the optimizer weighted them at 0. We don't really know - that's my point!


Back to square one?

Abadie is quite insistent on being careful when putting together the donor pool. Overfitting is obviously always an issue, but he also mentions shocks and interpolation bias, which I already touched upon before. E.g.:

Constructing a donor pool of comparison units requires some care. First, units affected by the event or
intervention of interest or by events of a similar nature
should be excluded from the donor pool. In addition,
units that may have suffered large idiosyncratic shocks to
the outcome of interest during the study period should
also be excluded if such shocks would have not affected
the treated unit in the absence of the treatment. Finally,
to avoid interpolation biases, it is important to restrict
the donor pool to units with characteristics similar to
the treated unit. Another reason to restrict the size of the
donor pool and consider only units similar to the treated
unit is to avoid overfitting. Overfitting arises when the
characteristics of the unit affected by the intervention or
event of interest are artificially matched by combining
idiosyncratic variations in a large sample of unaffected
units. The risk of overfitting motivates our adoption of
the cross-validation techniques applied in the empirical
section below.


I'm not familiar with the interpolation bias, but the other two are clear to me.

Besides, economists usually want to tell a story, hence a mere statistical relationship isn't good enough.

Let me rephrase it then: Using OECD countries (excluding Mexico, Chile and Turkey, i.e. only industrialized countries) as donors for France is perfectly reasonable.

wat0n wrote:Of the US social security? Yes, the payments are actually progressive (the replacement rate is decreasing on income IIRC).


I'm not familiar with the US pension system, but I suppose the interest paid is not taxed, hence excluding pension funds from a wealth tax makes no conceptual difference vis-a-vis a capital income tax. Although it might with different returns..
#15164883
Rugoz wrote:When looking at the effect on GDP ("the context") it obviously doesn't compare to an event like the German reunficiaton.


Ah, yes, that is true. But then, one would need to look at other outcomes if you consider the tax was not designed to collect much - such as measuring wealth inequality over time (a hard task).

Rugoz wrote:Back to square one?

Abadie is quite insistent on being careful when putting together the donor pool. Overfitting is obviously always an issue, but he also mentions shocks and interpolation bias, which I already touched upon before. E.g.:



I'm not familiar with the interpolation bias, but the other two are clear to me.


But you do that by removing covariates where the treated unit is the outlier - because you cannot interpolate it using the other donors anyway. Another issue with the paper is that it doesn't include the weights for each covariate, which is kind of odd if you ask me - wouldn't knowing those also be of interest? At least the traditional R package (synth) will include those in the output.

Of course, if other units were also treated then you would need to remove them (although in this particular case, it's trickier to decide what the treatment is. Is the treatment defined as "having a wealth tax" or "having a wealth tax, repealing it, putting a different version of it after"? France wasn't treated during the whole post-treatment period after all). That goes without saying. As for including "only units that are otherwise similar to the treated one", this is... Complicated. After all, you would need to then make a subjective call on that matter (if you ask me), and furthermore somehow argue that they should be excluded even though they help with model fit after cross-validation. I don't see why wouldn't one let the fitting algorithm sort that out, since this is based on the available data.

In this case, the subjective call would be to say that only OECD countries should be included. It's subjective because 1) if we took this seriously only those in the OECD by 1982 should be included - the measure of similarity is supposed to refer to the pre-treatment period, 2) you're assuming OECD membership is somehow based on economic similarity only. It's not.

Rugoz wrote:Besides, economists usually want to tell a story, hence a mere statistical relationship isn't good enough.


Indeed, and another thing economists love is to demolish someone else's story by arguing it has statistical issues.

Rugoz wrote:Let me rephrase it then: Using OECD countries (excluding Mexico, Chile and Turkey, i.e. only industrialized countries) as donors for France is perfectly reasonable.


You could rephrase it even better: Using capitalist industrialized countries circa 1982 may be reasonable. In this case, it would materially affect some of the results since Mexico has large weights.

But, one may argue that criterion may not be good enough. For instance, should one really include industrialized countries with a GDP structure that was radically different from France's, again, in 1982? Or why include covariates for which France is an outlier? When does the trimming of the donor pool (of units and covariates) stop?

Also, why would you do that when similar units will tend to have greater weights anyway? Aren't weights a data-based indicator of overall similarity up to an extent? And, at last, doesn't the data sort of "trim itself" in a way? After all, there are many countries that don't have long series anyway, even more so if the treatment begins before 1990. So even your largest possible donor pool may be tiny in practice, even more so after you remove other units that were treated.

Rugoz wrote:I'm not familiar with the US pension system, but I suppose the interest paid is not taxed, hence excluding pension funds from a wealth tax makes no conceptual difference vis-a-vis a capital income tax. Although it might with different returns..


Private pension funds stocks are counted as wealth in the literature as far as I'm aware. The issue has to do with the funds in PAYGO system only.
#15165156
wat0n wrote:But you do that by removing covariates where the treated unit is the outlier - because you cannot interpolate it using the other donors anyway. Another issue with the paper is that it doesn't include the weights for each covariate, which is kind of odd if you ask me - wouldn't knowing those also be of interest? At least the traditional R package (synth) will include those in the output.


Do what, deal with the interpolation bias? AFAIK the interpolation bias occurs when the outcome is non-linear in the predictors.

wat0n wrote:Of course, if other units were also treated then you would need to remove them (although in this particular case, it's trickier to decide what the treatment is. Is the treatment defined as "having a wealth tax" or "having a wealth tax, repealing it, putting a different version of it after"? France wasn't treated during the whole post-treatment period after all). That goes without saying. As for including "only units that are otherwise similar to the treated one", this is... Complicated. After all, you would need to then make a subjective call on that matter (if you ask me), and furthermore somehow argue that they should be excluded even though they help with model fit after cross-validation. I don't see why wouldn't one let the fitting algorithm sort that out, since this is based on the available data.


The pre-treatment period is about 20 years, hence they have a meagre 20 data points to do CV with. Meanwhile they have 6 predictors and corresponding weights. I have no idea how they pick the predictor weight candidates, which have to be few in order to not overfit the validation data.

In any case, I'm sceptical of a data-driven approach given how little data there is available.

wat0n wrote:In this case, the subjective call would be to say that only OECD countries should be included. It's subjective because 1) if we took this seriously only those in the OECD by 1982 should be included - the measure of similarity is supposed to refer to the pre-treatment period, 2) you're assuming OECD membership is somehow based on economic similarity only. It's not.


OECD basically means industralized, for the time period in question (60s onwards). Again, without Chile, Mexico and Turkey, which Abadie also excludes.

wat0n wrote:Private pension funds stocks are counted as wealth in the literature as far as I'm aware. The issue has to do with the funds in PAYGO system only.


Private funds are already excluded from capital taxes though, let alone publicy mandated ones (I'm only interested in the capital vs wealth tax issue).
#15165160
Rugoz wrote:Do what, deal with the interpolation bias? AFAIK the interpolation bias occurs when the outcome is non-linear in the predictors.


Or if you are trying to interpolate a predictor that can't be interpolated using a convex combination of the donors... Which will happen if your treated unit is an outlier in the distribution. I'd then say that's probably a "bad" covariate.

It's mentioned by Abadie in one of the fragments you cited.

Rugoz wrote:The pre-treatment period is about 20 years, hence they have a meagre 20 data points to do CV with. Meanwhile they have 6 predictors and corresponding weights. I have no idea how they pick the predictor weight candidates, which have to be few in order to not overfit the validation data.

In any case, I'm sceptical of a data-driven approach given how little data there is available.


You can do it if you use enough predictors. It's not ideal but it's certainly possible.

Rugoz wrote:Private funds are already excluded from capital taxes though, let alone publicy mandated ones (I'm only interested in the capital vs wealth tax issue).


Ah, I think that's the issue. I'm more interested in the reasoning for a wealth tax beyond just capital taxes, simply because it doesn't seem to be advocated as a replacement for them.
#15165348
wat0n wrote:Or if you are trying to interpolate a predictor that can't be interpolated using a convex combination of the donors... Which will happen if your treated unit is an outlier in the distribution. I'd then say that's probably a "bad" covariate.


Why not use some sort of regularization instead of the "sums to one" constraint anyway. I mean I guess it makes sense for certain variables.

wat0n wrote:You can do it if you use enough predictors. It's not ideal but it's certainly possible.


"enough predictors"? If anything having more predictators makes it worse.
#15165366
Rugoz wrote:Why not use some sort of regularization instead of the "sums to one" constraint anyway. I mean I guess it makes sense for certain variables.


That would in all fairness require a change in the algorithm (even if the broad idea remains the same). In the end it depends on the package/optimizer you use.

Rugoz wrote:"enough predictors"? If anything having more predictators makes it worse.


The optimizer will look both at the units and the predictors, and will select among both after doing its own regularization. I agree it's not ideal to have a short pre-treatment series but it can't be helped, can it? You have to get information for fitting from somewhere...
#15165396
wat0n wrote:The optimizer will look both at the units and the predictors, and will select among both after doing its own regularization. I agree it's not ideal to have a short pre-treatment series but it can't be helped, can it? You have to get information for fitting from somewhere...


It can be "helped" by removing degrees of freedom through "expert judgement", e.g. limiting the donor pool as discussed before.
#15165400
Rugoz wrote:It can be "helped" by removing degrees of freedom through "expert judgement", e.g. limiting the donor pool as discussed before.


Or even better, add penalty terms (i.e. regularize further). It's better because you are not relying on subjective criteria to limit the donor pool.
#15165550
wat0n wrote:Or even better, add penalty terms (i.e. regularize further). It's better because you are not relying on subjective criteria to limit the donor pool.


The regularization penalty is (usually) chosen by CV, so not really. Of course the type of regularization applied is itself already an "expert judgement". In Bayesian terms you're simply picking a different prior, e.g. normal with Ridge or laplace with Lasso.
#15165609
Rugoz wrote:The regularization penalty is (usually) chosen by CV, so not really. Of course the type of regularization applied is itself already an "expert judgement". In Bayesian terms you're simply picking a different prior, e.g. normal with Ridge or laplace with Lasso.


It is, but it's not really the same as deliberately choosing what sample to throw away. You can't really compare both here.

Of course you can use a type of regularization or an optimizer that will select more sparse weights matrices than others (it's what I meant). But you can't really compare that to "expert judgment". You can also justify any choice based on statistical criteria.

I'll pick a data based, replicable algorithm over "expert judgment" anytime.
#15165809
wat0n wrote:It is, but it's not really the same as deliberately choosing what sample to throw away. You can't really compare both here.

Of course you can use a type of regularization or an optimizer that will select more sparse weights matrices than others (it's what I meant). But you can't really compare that to "expert judgment". You can also justify any choice based on statistical criteria.


Throwing away variables and hereby reducing the degrees of freedom (I presume this is in principle equivalent to throwing away units in the synthetic control method) is a way to make the model less prone to overfitting. I.e. you reduce P relative to N. Throwing away a part of the sample would reduce N, that's not what I meant.

The alternative is to limit the parameter space, i.e. to reduce the freedom of the parameters. Saying the weights must sum up to one is one way to do it. Defining a Laplace prior (Lasso) is another way. Even if you learn the hyperparameters, you still have to define what distribution the parameters follow.

By "expert judgement" I simply mean it's a choice the researcher makes. It doesn't have to be subjective.
#15165865
Rugoz wrote:Throwing away variables and hereby reducing the degrees of freedom (I presume this is in principle equivalent to throwing away units in the synthetic control method) is a way to make the model less prone to overfitting. I.e. you reduce P relative to N. Throwing away a part of the sample would reduce N, that's not what I meant.

The alternative is to limit the parameter space, i.e. to reduce the freedom of the parameters. Saying the weights must sum up to one is one way to do it. Defining a Laplace prior (Lasso) is another way. Even if you learn the hyperparameters, you still have to define what distribution the parameters follow.

By "expert judgement" I simply mean it's a choice the researcher makes. It doesn't have to be subjective.


You have to consider that in this case, the inclusion of units is also a parameter. To fix ideas, citing from the 2010 paper, the parameters it will control are both those of the Jx1 vector W and the covariate weights of the kxk matrix V, in that paper - in the implementation section 2.3 the proposal is to minimize sqrt[(X1-X0W)'V(X1-X0W)], where X1 is a kx1 a vector with the covariate values and a number of weighted pre-treatment values for the outcome for that unit, X0 is the kxJ matrix of the analogous vectors for the untreated J units (each stacked as column vectors basically).

When you optimize, you want to choose W and V to minimize the mean squared error of the pre treatment. To prevent overfitting, you can and indeed should go on and regularize, and depending on the method you can achieve a more sparse result - not just in terms of V but in terms of W too.
#15166005
wat0n wrote:You have to consider that in this case, the inclusion of units is also a parameter. To fix ideas, citing from the 2010 paper, the parameters it will control are both those of the Jx1 vector W and the covariate weights of the kxk matrix V, in that paper - in the implementation section 2.3 the proposal is to minimize sqrt[(X1-X0W)'V(X1-X0W)], where X1 is a kx1 a vector with the covariate values and a number of weighted pre-treatment values for the outcome for that unit, X0 is the kxJ matrix of the analogous vectors for the untreated J units (each stacked as column vectors basically).

When you optimize, you want to choose W and V to minimize the mean squared error of the pre treatment. To prevent overfitting, you can and indeed should go on and regularize, and depending on the method you can achieve a more sparse result - not just in terms of V but in terms of W too.


Right, so reducing the number of units helps with the overfitting issue, i.e. what Abadie says.

Here he proposes to divide the pre-treatment period into a training and validation period to pick the optimal V (kx1 vector of predictor weights).
https://economics.mit.edu/files/17847

What he doesn't say is how to pick the candidates for V. Given the very limited data available in the problems considered that candidate pool has to small if you want to avoid overfitting the validation data.
#15166010
Rugoz wrote:Right, so reducing the number of units helps with the overfitting issue, i.e. what Abadie says.

Here he proposes to divide the pre-treatment period into a training and validation period to pick the optimal V (kx1 vector of predictor weights).
https://economics.mit.edu/files/17847

What he doesn't say is how to pick the candidates for V. Given the very limited data available in the problems considered that candidate pool has to small if you want to avoid overfitting the validation data.


You can regularize to do it. You also can do the same to choose units if you don't want to rely on subjective criteria ("expert judgment"), as mentioned in section 2.2. For instance, you can change the objective function by adding an elastic net penalty to this weighted distance if you want.

Of course, I don't expect an undergrad to do it if the software implementation doesn't include something like that. But then I can't be expected to make policy recommendations out of that kind of paper either.
#15166054
late wrote:You missed a small point, inflation reduces the value of debt by reducing the value of a dollar.

It's also liable to make interest rates higher in the future if government wants to borrow money from the private sector.
If they want to borrow money from themselves, on the other hand, it creates an inflationary cost.

No one's going to lend money to government if they know it's just going to be inflated away.


This idea people have that they can just make the debt go away with inflation is illogical insanity.
I mean seriously, it's like the type of idea that would come from a little child.

Inflating your way out of debt won't really hurt your credit rating less than an outright default would.
(if instead of inflation, we just told everyone 2% of the money they are owed they are not going to get)
  • 1
  • 3
  • 4
  • 5
  • 6
  • 7

It came up in response to what you were saying. […]

"We are as gods, we might as well get good at[…]

If the Dems increase the size of the court to 13 […]

Now reading

Asimov is a terrible writer, as far as I can tell[…]