Variable aggregation Granular computing



a watanabe-kraskov variable agglomeration tree. variables agglomerated (or unitized ) bottom-up, each merge-node representing (constructed) variable having entropy equal joint entropy of agglomerating variables. thus, agglomeration of 2 m-ary variables




x

1




{\displaystyle x_{1}}

,




x

2




{\displaystyle x_{2}}

having individual entropies



h
(

x

1


)


{\displaystyle h(x_{1})}

,



h
(

x

2


)


{\displaystyle h(x_{2})}

yields single




m

2




{\displaystyle m^{2}}

-ary variable




x

1
,
2




{\displaystyle x_{1,2}}

entropy



h
(

x

1
,
2


)
=
h
(

x

1


,

x

2


)


{\displaystyle h(x_{1,2})=h(x_{1},x_{2})}

. when




x

1




{\displaystyle x_{1}}

,




x

2




{\displaystyle x_{2}}

highly dependent (i.e., redundant) , have large mutual information



i
(

x

1


;

x

2


)


{\displaystyle i(x_{1};x_{2})}

,



h
(

x

1
,
2


)


{\displaystyle h(x_{1,2})}





h
(

x

1


)
+
h
(

x

2


)


{\displaystyle h(x_{1})+h(x_{2})}

because



h
(

x

1


,

x

2


)
=
h
(

x

1


)
+
h
(

x

2


)

i
(

x

1


;

x

2


)


{\displaystyle h(x_{1},x_{2})=h(x_{1})+h(x_{2})-i(x_{1};x_{2})}

, , considered parsimonious unitization or aggregation.


similarly, reasonable ask whether large set of variables might aggregated smaller set of prototype variables capture salient relationships between variables. although variable clustering methods based on linear correlation have been proposed (duda, hart & stork 2001;rencher 2002), more powerful methods of variable clustering based on mutual information between variables. watanabe has shown (watanabe 1960;watanabe 1969) set of variables 1 can construct polytomic (i.e., n-ary) tree representing series of variable agglomerations in ultimate total correlation among complete variable set sum of partial correlations exhibited each agglomerating subset (see figure). watanabe suggests observer might seek partition system in such way minimize interdependence between parts ... if looking natural division or hidden crack.


one practical approach building such tree successively choose agglomeration 2 variables (either atomic variables or agglomerated variables) have highest pairwise mutual information (kraskov et al. 2003). product of each agglomeration new (constructed) variable reflects local joint distribution of 2 agglomerating variables, , possesses entropy equal joint entropy. (from procedural standpoint, agglomeration step involves replacing 2 columns in attribute-value table—representing 2 agglomerating variables—with single column has unique value every unique combination of values in replaced columns (kraskov et al. 2003). no information lost such operation; however, should noted if 1 exploring data inter-variable relationships, not desirable merge redundant variables in way, since in such context precisely redundancy or dependency between variables of interest; , once redundant variables merged, relationship 1 can no longer studied.







Comments

Popular posts from this blog

Mobility.2C training and insignia Impi

Expenses controversy Ian Gibson (politician)

11th century parish church of St Leonard Hythe, Kent