Replace “independent variables” with “modeling variables”

The mean­ing of the words depen­dent and inde­pen­dent are not dif­fi­cult to under­stand. Inde­pen­dent sim­ply means not depen­dent. It could refer to a per­son, a mol­e­cule, a cell, a coun­try, or to any kind of object. The point is that it is not influ­enced or con­trolled by any­thing, except by itself. It is IN-depen­dent. Of course, no per­son or nation can be com­pletely inde­pen­dent, and there are degrees of depen­dence and inde­pen­dence. This is the first prob­lem with these words. They have of black-or-white character.

The sec­ond prob­lem appears when using them in the con­text of mod­el­ing. They are guid­ing us to think about cause-and-effect. It seems very rea­son­able to believe that the “depen­dent vari­ables” are depend­ing some­thing. And because there are no other vari­ables around, the reader is guided to think that they are depend­ing on the “inde­pen­dent variables”.

Although we already see that there is a prob­lem with these words, let us do our best to under­stand them when used in data mod­el­ing, and to make them fit to a mod­el­ing context.

Inde­pen­dent” could mean that the vari­ables are not depend­ing on any­thing at all. But what could be an exam­ple of such vari­ables? Think about this and you will real­ize that only ran­dom vari­ables are like that. Ran­dom vari­ables are by def­i­n­i­tion not caused or affected by any­thing at all! So the words “inde­pen­dent vari­ables” can be used for ran­dom num­bers. But in data mod­el­ing, we usu­ally don’t work with ran­dom numbers.

Inde­pen­dent” could also mean that there is no cor­re­la­tion between the vari­ables, that the vari­ables are inde­pen­dent of each other.  The level of one vari­able will not be related to the level of any other of the inde­pen­dent vari­ables. Such sets of inde­pen­dent vari­ables can be cre­ated by using designed exper­i­ments where only selected vari­ables hav­ing (close to) zero cor­re­la­tion are allowed. An inter­est­ing point in this con­text is that beside the designed vari­ables, ran­dom vari­ables also have zero cor­re­la­tion. But again, we usu­ally don’t work with ran­dom num­bers in data modeling.

The third prob­lem with the words “depen­dent” and “inde­pen­dent” is there­fore that no mat­ter how we try to make “inde­pen­dent vari­ables” under­stand­able, they relate only to tightly con­trolled exper­i­ments, or to ran­dom vari­ables. For­tu­nately, the nature has more to offer than this. For­tu­nately, the world is not black and white.

The way out of the swamp is to use stop using “inde­pen­dent” and “depen­dent” and to use other words. One way is to use “pre­dic­tor vari­ables” instead of inde­pen­dent vari­ables and “response vari­ables” to replace “depen­dent vari­ables”. I have often copied these from the lit­er­a­ture and I have used them myself in my own sci­en­tific writ­ing, but I never felt com­pletely happy about them. The rea­son is that they have a ten­dency to guide the reader towards cause-and-effect thinking.

The bet­ter way would be to use “mod­el­ing vari­ables” to rep­re­sent the vari­ables mak­ing the model instead! “Model” is a very good word to explain that some­thing is a sim­pli­fi­ca­tion. “It’s not the real thing, it’s a model”. Even bet­ter, “mod­el­ing vari­ables” are a sim­pli­fi­ca­tions in TWO ways. They are 1) a sim­pli­fi­ca­tion them­selves because not all vari­ables of the uni­verse are included and 2) because there will always be a part of the vari­ables that does not relate to the sys­tem under study or that will not be pos­si­ble to describe, for exam­ple ran­dom errors. There­fore, when cre­at­ing a math­e­mat­i­cal model from the mod­el­ing vari­ables, a next step of simplification/modeling, is being taken. Again, it is great that the word “model” is used. It is clear from the begin­ning that is is an approx­i­ma­tion. Another good point is that “mod­el­ing vari­ables” is a neu­tral expres­sion, clearly stat­ing that it is NOT about cause-and-effect.

Finally, use “pre­dicted vari­ables” or “esti­mated vari­ables” if we are try­ing to cal­cu­late other vari­ables using e.g. regres­sion. In this way, we will say that the mod­el­ing vari­ables can be used to obtain an esti­mated or pre­dicted variable.

Leave a Comment

Your email address will not be published. Required fields are marked *

This website uses cookies. By continuing to use this site, you accept our use of cookies. 

Scroll to Top