;+
; :NAME:
; GENERATE_CORRELATED_DATA
;
; :PURPOSE:
; This function will return a list of k normally-distributed,
; correlated, n-dimensional random points [with a specified
; mean vector and set of univariate standard deviations].
;
; :CATEGORY:
; Statistics.
;
; :CALLING SEQUENCE:
; Result = $
; GENERATE_CORRELATED_DATA, seed,
; numberPoints,
; correlationMatrix,
; MEAN_VECTOR=meanVector,
; STANDARD_DEVIATION=standardDeviation,
;
; :INPUTS:
; seed
; A variable or constant (long integer) used to initialize
; the random sequence on input, and in which the state of the
; random number generator is saved on output.
; numberPoints
; A variable or constant containing the number of n-dimensional
; values to be produced.
; correlationMatrix
; A two-dimensional matrix describing the correlation to be
; exhibited between variables. This matrix will also define
; the dimensionality of the produced data set (for example, a
; 3x3 correlation matrix will cause 3-dimensional data to be
; generated).
;
; :KEYWORD PARAMETERS:
; MEAN_VECTOR
; A vector defining the mean for each dimension of the generated
; data set. If omitted, the mean vector will be filled with 0's.
; STANDARD_DEVIATION
; A vector defining the standard deviation for each dimension of
; the generated data set. If omitted, the standard deviation vector
; will be filled with 1's.
;
; :RETURN VALUE:
; A matrix containing multi-dimensional, normally-distributed,
; correlated random values. The random values will be contained
; in the rows of the returned matrix.
;
; If the dimensions of the provided descriptive statistics are
; incompatible, a scaler value of -1 will be returned.
;
; If the dimensions of the data set, as determined from the
; provided correlation matrix is less than 2, then a scalar
; value of -2 will be returned.
;
; :SIDE EFFECTS:
; If the seed is specifed as any data type besides a long integer, it
; will be truncated and changed to this data type.
;
; :MODIFICATION HISTORY:
; Written by: Carl Salvaggio
; April, 2009 Original code
;
; :DISCLAIMER:
; This source code is provided "as is" and without warranties as to performance
; or merchantability. The author and/or distributors of this source code may
; have made statements about this source code. Any such statements do not
; constitute warranties and shall not be relied on by the user in deciding
; whether to use this source code.
;
; This source code is provided without any express or implied warranties
; whatsoever. Because of the diversity of conditions and hardware under which
; this source code may be used, no warranty of fitness for a particular purpose
; is offered. The user is advised to test the source code thoroughly before
; relying on it. The user must assume the entire risk of using the source code.
;-
FUNCTION GENERATE_CORRELATED_DATA, seed, $
numberPoints, $
correlationMatrix, $
MEAN_VECTOR=meanVector, $
STANDARD_DEVIATION=standardDeviation
;+
; Determine the dimensionality of the points to be produced from
; the provided correlation matrix
;-
numberDimensions = SQRT( N_ELEMENTS( correlationMatrix ) )
;+
; Determine if optional parameters were specified, and if not,
; insert default values
;-
IF NOT KEYWORD_SET( meanVector ) THEN BEGIN
meanVector = REPLICATE( 0D, numberDimensions )
ENDIF
numberMeans = N_ELEMENTS( meanVector )
IF NOT KEYWORD_SET( standardDeviation ) THEN BEGIN
standardDeviation = REPLICATE( 1D, numberDimensions )
ENDIF
numberStandardDeviations = N_ELEMENTS( standardDeviation )
;+
; Make sure that the dimensions of the provided correlation
; matrix and the provided mean and standard deviation vectors
; are compatible
;-
IF ( numberMeans NE numberStandardDeviations ) OR $
( numberStandardDeviations NE numberDimensions ) OR $
( numberDimensions NE numberMeans ) THEN BEGIN
PRINT, ""
PRINT, "%GENERATE_CORRELATED_DATA: mean/standard deviation/correlation dimensions incompatable"
PRINT, ""
RETURN, -1
ENDIF
IF ( numberDimensions LT 2 ) THEN BEGIN
PRINT, ""
PRINT, "%GENERATE_CORRELATED_DATA: Must have statistics for at least 2 dimensions"
PRINT, ""
RETURN, -2
ENDIF
;+
; Generated a numberDimensions-by-numberPoints matrix of unit normal
; random values.
;-
seed = LONG( seed )
data = RANDOMN( seed, numberDimensions, numberPoints, /DOUBLE )
;+
; Perform a Cholesky decomposition of the upper-triangular form
; of the provided correlation matrix
;-
u = DBLARR( numberDimensions, numberDimensions )
FOR j = 0, numberDimensions-1 DO BEGIN
FOR i = j, numberDimensions-1 DO BEGIN
u[i,j] = correlationMatrix[i,j]
ENDFOR
ENDFOR
LA_CHOLDC, u, /UPPER, /DOUBLE
;+
; Using the Cholesky decomposition of the correlation matrix,
; transform the previously generated unit normal random values
; to a correlated data set (based on the provided correlation
; matrix)
;-
correlatedData = data ## u
;+
; Modify the unit normal correlated random values to exhibit the
; mean and standard deviation vectors provided
;-
gain = TRANSPOSE( REPLICATE( 1, numberPoints ) ) ## standardDeviation
offset = TRANSPOSE( REPLICATE( 1, numberPoints ) ) ## meanVector
correlatedData = correlatedData * gain + offset
;+
; Return the generated data set to the calling routine
;-
RETURN, correlatedData
END