Factor regression model

Within statistical factor analysis, the factor regression model,[1] or hybrid factor model,[2] is a special multivariate model with the following form:

y n = A x n + B z n + c + e n {\displaystyle \mathbf {y} _{n}=\mathbf {A} \mathbf {x} _{n}+\mathbf {B} \mathbf {z} _{n}+\mathbf {c} +\mathbf {e} _{n}}

where,

y n {\displaystyle \mathbf {y} _{n}} is the n {\displaystyle n} -th G × 1 {\displaystyle G\times 1} (known) observation.
x n {\displaystyle \mathbf {x} _{n}} is the n {\displaystyle n} -th sample L x {\displaystyle L_{x}} (unknown) hidden factors.
A {\displaystyle \mathbf {A} } is the (unknown) loading matrix of the hidden factors.
z n {\displaystyle \mathbf {z} _{n}} is the n {\displaystyle n} -th sample L z {\displaystyle L_{z}} (known) design factors.
B {\displaystyle \mathbf {B} } is the (unknown) regression coefficients of the design factors.
c {\displaystyle \mathbf {c} } is a vector of (unknown) constant term or intercept.
e n {\displaystyle \mathbf {e} _{n}} is a vector of (unknown) errors, often white Gaussian noise.

Relationship between factor regression model, factor model and regression model

The factor regression model can be viewed as a combination of factor analysis model ( y n = A x n + c + e n {\displaystyle \mathbf {y} _{n}=\mathbf {A} \mathbf {x} _{n}+\mathbf {c} +\mathbf {e} _{n}} ) and regression model ( y n = B z n + c + e n {\displaystyle \mathbf {y} _{n}=\mathbf {B} \mathbf {z} _{n}+\mathbf {c} +\mathbf {e} _{n}} ).

Alternatively, the model can be viewed as a special kind of factor model, the hybrid factor model [2]

y n = A x n + B z n + c + e n = [ A B ] [ x n z n ] + c + e n = D f n + c + e n {\displaystyle {\begin{aligned}&\mathbf {y} _{n}=\mathbf {A} \mathbf {x} _{n}+\mathbf {B} \mathbf {z} _{n}+\mathbf {c} +\mathbf {e} _{n}\\=&{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}{\begin{bmatrix}\mathbf {x} _{n}\\\mathbf {z} _{n}\end{bmatrix}}+\mathbf {c} +\mathbf {e} _{n}\\=&\mathbf {D} \mathbf {f} _{n}+\mathbf {c} +\mathbf {e} _{n}\end{aligned}}}

where, D = [ A B ] {\displaystyle \mathbf {D} ={\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}} is the loading matrix of the hybrid factor model and f n = [ x n z n ] {\displaystyle \mathbf {f} _{n}={\begin{bmatrix}\mathbf {x} _{n}\\\mathbf {z} _{n}\end{bmatrix}}} are the factors, including the known factors and unknown factors.

Software

Open source software to perform factor regression is available.

References

  1. ^ Carvalho, Carlos M. (1 December 2008). "High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics". Journal of the American Statistical Association. 103 (484): 1438–1456. doi:10.1198/016214508000000869. PMC 3017385. PMID 21218139.
  2. ^ a b Meng, J. (2011). "Uncover cooperative gene regulations by microRNAs and transcription factors in glioblastoma using a nonnegative hybrid factor model". International Conference on Acoustics, Speech and Signal Processing. Archived from the original on 2011-11-23.