Compute the optimal encoding for categorical functional data using an extension of the multiple correspondence analysis to a stochastic process.

compute_optimal_encoding(
  data,
  basisobj,
  computeCI = TRUE,
  nBootstrap = 50,
  propBootstrap = 1,
  method = c("precompute", "parallel"),
  verbose = TRUE,
  nCores = max(1, ceiling(detectCores()/2)),
  ...
)

Arguments

data

data.frame containing id, id of the trajectory, time, time at which a change occurs and state, associated state. All individuals must begin at the same time T0 and end at the same time Tmax (use cut_data).

basisobj

basis created using the fda package (cf. create.basis).

computeCI

if TRUE, perform a bootstrap to estimate the variance of encoding functions coefficients

nBootstrap

number of bootstrap samples

propBootstrap

size of bootstrap samples relative to the number of individuals: propBootstrap * number of individuals

method

computation method: "parallel" or "precompute": precompute all integrals (efficient when the number of unique time values is low)

verbose

if TRUE print some information

nCores

number of cores used for parallelization (only if method == "parallel"). Default is half the cores.

...

parameters for integrate function (see details).

Value

A list containing:

  • eigenvalues eigenvalues

  • alpha optimal encoding coefficients associated with each eigenvectors

  • pc principal components

  • F matrix containing the \(F_{(x,i)(y,j)}\)

  • V matrix containing the \(V_{(x,i)}\)

  • G covariance matrix of V

  • basisobj basisobj input parameter

  • pt output of estimate_pt function

  • bootstrap Only if computeCI = TRUE. Output of every bootstrap run

  • varAlpha Only if computeCI = TRUE. Variance of alpha parameters

  • runTime Total elapsed time

Details

See the vignette for the mathematical background: RShowDoc("cfda", package = "cfda")

Extra parameters (...) for the integrate function can be:

  • subdivisions the maximum number of subintervals.

  • rel.tol relative accuracy requested.

  • abs.tol absolute accuracy requested.

References

  • Deville J.C. (1982) Analyse de données chronologiques qualitatives : comment analyser des calendriers ?, Annales de l'INSEE, No 45, p. 45-104.

  • Deville J.C. et Saporta G. (1980) Analyse harmonique qualitative, DIDAY et al. (editors), Data Analysis and Informatics, North Holland, p. 375-389.

  • Saporta G. (1981) Méthodes exploratoires d'analyse de données temporelles, Cahiers du B.U.R.O, Université Pierre et Marie Curie, 37-38, Paris.

  • Preda C, Grimonprez Q, Vandewalle V. Categorical Functional Data Analysis. The cfda R Package. Mathematics. 2021; 9(23):3074. https://doi.org/10.3390/math9233074

Author

Cristian Preda, Quentin Grimonprez

Examples

# Simulate the Jukes-Cantor model of nucleotide replacement
K <- 4
Tmax <- 5
PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K))
lambda_PJK <- c(1, 1, 1, 1)
d_JK <- generate_Markov(
  n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = Tmax,
  labels = c("A", "C", "G", "T")
)
d_JK2 <- cut_data(d_JK, Tmax)

# create basis object
m <- 5
b <- create.bspline.basis(c(0, Tmax), nbasis = m, norder = 4)

# compute encoding
encoding <- compute_optimal_encoding(d_JK2, b, computeCI = FALSE, nCores = 1)
#> ######### Compute encoding #########
#> Number of individuals: 10
#> Number of states: 4
#> Basis type: bspline
#> Number of basis functions: 5
#> Number of cores: 1
#> 
  |                                                  | 0 % elapsed=00s   
  |==========                                        | 20% elapsed=00s, remaining~00s
  |====================                              | 40% elapsed=00s, remaining~00s
  |==============================                    | 60% elapsed=00s, remaining~00s
  |========================================          | 80% elapsed=00s, remaining~00s
  |==================================================| 100% elapsed=00s, remaining~00s
#> 
#> DONE in 0.11s
#> ---- Compute U matrix:
#> 
  |                                                  | 0 % elapsed=00s   
  |====                                              | 7 % elapsed=00s, remaining~00s
  |=======                                           | 13% elapsed=00s, remaining~00s
  |==========                                        | 20% elapsed=00s, remaining~00s
  |==============                                    | 27% elapsed=00s, remaining~00s
  |=================                                 | 33% elapsed=00s, remaining~00s
  |====================                              | 40% elapsed=00s, remaining~00s
  |========================                          | 47% elapsed=00s, remaining~00s
  |===========================                       | 53% elapsed=00s, remaining~00s
  |==============================                    | 60% elapsed=00s, remaining~00s
  |==================================                | 67% elapsed=00s, remaining~00s
  |=====================================             | 73% elapsed=00s, remaining~00s
  |========================================          | 80% elapsed=00s, remaining~00s
  |============================================      | 87% elapsed=00s, remaining~00s
  |===============================================   | 93% elapsed=00s, remaining~00s
  |==================================================| 100% elapsed=00s, remaining~00s
#> 
#> DONE in 0.45s
#> ---- Compute encoding: 
#> DONE in 0.01s
#> Run Time: 0.57s
summary(encoding)
#> #### FMCA
#> 
#> ## Data 
#> Number of individuals: 10 
#> Number of states: 4 
#> Time Range: 0 to 5 
#> States:  A C G T 
#> 
#> ## Basis 
#> Type: bspline 
#> Number of basis functions: 5 
#> 
#> ## Outputs
#> Eigenvalues:
#>   2.218392 1.937556 1.784195 1.437139 1.006474 0.5203247 
#> 
#> Explained variance:
#>   0.233 0.436 0.624 0.774 0.88 0.935 
#> 
#> Optimal encoding:
#>                A          C          G          T
#> [1,] -0.17857528 -2.9546575 -0.9845094  0.2970705
#> [2,]  0.84661586  1.0606783 -0.9208496 -0.3731264
#> [3,]  0.03111838 -1.1736301 -0.5978684  0.1589368
#> [4,] -0.45340458 -0.1995142  0.0581531  1.0269315
#> [5,]  0.05504881 -0.8522622 -0.5425548  0.8150787
#> 
#> Principal components:
#>         [,1]       [,2]       [,3]       [,4]       [,5]        [,6]
#> 1  0.1009264  1.7834406 -0.1020584 -1.1204880  1.5380992  0.34439307
#> 2 -2.1883562 -2.4501380  0.6681822  0.3407541  0.8269323 -0.30967690
#> 3 -0.5637327 -0.4670429  2.4304105 -0.1406617 -0.7302521  0.26113645
#> 4  2.3301199 -0.3417037  0.0102262  0.3416691  0.4806338 -0.42944181
#> 5 -1.4152096 -0.4455945 -2.9004930 -0.7621925 -0.8536572 -0.19485874
#> 6 -1.4478108  1.8617775 -0.1864696  1.8808033  0.8624878 -0.01406987
#> 
#> Total elapsed time: 0.572 s

# plot the optimal encoding
plot(encoding)
#> Warning: Removed 13 rows containing missing values (`geom_line()`).


# plot the two first components
plotComponent(encoding, comp = c(1, 2))


# extract the optimal encoding
get_encoding(encoding, harm = 1)
#> $x
#>   [1] 0.00000000 0.03937008 0.07874016 0.11811024 0.15748031 0.19685039
#>   [7] 0.23622047 0.27559055 0.31496063 0.35433071 0.39370079 0.43307087
#>  [13] 0.47244094 0.51181102 0.55118110 0.59055118 0.62992126 0.66929134
#>  [19] 0.70866142 0.74803150 0.78740157 0.82677165 0.86614173 0.90551181
#>  [25] 0.94488189 0.98425197 1.02362205 1.06299213 1.10236220 1.14173228
#>  [31] 1.18110236 1.22047244 1.25984252 1.29921260 1.33858268 1.37795276
#>  [37] 1.41732283 1.45669291 1.49606299 1.53543307 1.57480315 1.61417323
#>  [43] 1.65354331 1.69291339 1.73228346 1.77165354 1.81102362 1.85039370
#>  [49] 1.88976378 1.92913386 1.96850394 2.00787402 2.04724409 2.08661417
#>  [55] 2.12598425 2.16535433 2.20472441 2.24409449 2.28346457 2.32283465
#>  [61] 2.36220472 2.40157480 2.44094488 2.48031496 2.51968504 2.55905512
#>  [67] 2.59842520 2.63779528 2.67716535 2.71653543 2.75590551 2.79527559
#>  [73] 2.83464567 2.87401575 2.91338583 2.95275591 2.99212598 3.03149606
#>  [79] 3.07086614 3.11023622 3.14960630 3.18897638 3.22834646 3.26771654
#>  [85] 3.30708661 3.34645669 3.38582677 3.42519685 3.46456693 3.50393701
#>  [91] 3.54330709 3.58267717 3.62204724 3.66141732 3.70078740 3.74015748
#>  [97] 3.77952756 3.81889764 3.85826772 3.89763780 3.93700787 3.97637795
#> [103] 4.01574803 4.05511811 4.09448819 4.13385827 4.17322835 4.21259843
#> [109] 4.25196850 4.29133858 4.33070866 4.37007874 4.40944882 4.44881890
#> [115] 4.48818898 4.52755906 4.56692913 4.60629921 4.64566929 4.68503937
#> [121] 4.72440945 4.76377953 4.80314961 4.84251969 4.88188976 4.92125984
#> [127] 4.96062992 5.00000000
#> 
#> $y
#>                   A            C          G            T
#>   [1,] -0.178575280           NA         NA           NA
#>   [2,] -0.131201243           NA -0.9814291           NA
#>   [3,] -0.085923907           NA         NA  0.236504358
#>   [4,] -0.042707756           NA         NA  0.208261049
#>   [5,] -0.001517271           NA -0.9713182  0.181350938
#>   [6,]  0.037683063           NA -0.9676591  0.155754056
#>   [7,]  0.074928764           NA -0.9638561  0.131450430
#>   [8,]  0.110255349           NA -0.9599095  0.108420090
#>   [9,]  0.143698336           NA -0.9558198  0.086643066
#>  [10,]  0.175293242           NA -0.9515872  0.066099386
#>  [11,]  0.205075584 -1.416328813         NA  0.046769080
#>  [12,]  0.233080878 -1.299146280         NA  0.028632178
#>  [13,]  0.259344643 -1.188071180         NA  0.011668707
#>  [14,]  0.283902396 -1.082964448         NA -0.004141301
#>  [15,]  0.306789653 -0.983687016         NA -0.018817819
#>  [16,]  0.328041932 -0.890099820         NA -0.032380817
#>  [17,]  0.347694750 -0.802063794         NA -0.044850266
#>  [18,]  0.365783624 -0.719439870         NA -0.056246136
#>  [19,]  0.382344072 -0.642088983         NA -0.066588399
#>  [20,]  0.397411610 -0.569872067         NA -0.075897024
#>  [21,]  0.411021756 -0.502650056         NA -0.084191984
#>  [22,]  0.423210027 -0.440283883         NA -0.091493248
#>  [23,]  0.434011940 -0.382634483         NA -0.097820787
#>  [24,]  0.443463013 -0.329562790         NA -0.103194573
#>  [25,]  0.451598762 -0.280929737         NA -0.107634575
#>  [26,]  0.458454705 -0.236596258 -0.8647184 -0.111160766
#>  [27,]  0.464066359 -0.196423287 -0.8581098 -0.113793114
#>  [28,]  0.468469241 -0.160271759 -0.8513645 -0.115551592
#>  [29,]  0.471698868 -0.128002607 -0.8444828 -0.116456171
#>  [30,]  0.473790758 -0.099476764 -0.8374651 -0.116526819
#>  [31,]  0.474780428 -0.074555166 -0.8303117 -0.115783510
#>  [32,]  0.474703394 -0.053098745 -0.8230231 -0.114246213
#>  [33,]  0.473595175 -0.034968436 -0.8155994 -0.111934899
#>  [34,]  0.471491287 -0.020025173 -0.8080411 -0.108869539
#>  [35,]  0.468427247 -0.008129889 -0.8003484 -0.105070103
#>  [36,]  0.464438573           NA -0.7925218 -0.100556563
#>  [37,]  0.459560782           NA -0.7845616 -0.095348889
#>  [38,]  0.453829390           NA -0.7764681 -0.089467052
#>  [39,]  0.447279916           NA -0.7682417 -0.082931023
#>  [40,]  0.439947877           NA -0.7598827 -0.075760773
#>  [41,]  0.431868789           NA -0.7513914 -0.067976272
#>  [42,]  0.423078170           NA -0.7427682 -0.059597491
#>  [43,]  0.413611537           NA -0.7340134 -0.050644401
#>  [44,]  0.403504407           NA -0.7251274 -0.041136972
#>  [45,]  0.392792297           NA -0.7161105 -0.031095176
#>  [46,]  0.381510725           NA -0.7069631 -0.020538983
#>  [47,]  0.369695208 -0.052508826 -0.6976854 -0.009488364
#>  [48,]  0.357381262 -0.067582261 -0.6882779  0.002036711
#>  [49,]  0.344604406 -0.083756751 -0.6787408  0.014016269
#>  [50,]  0.331400156 -0.100893229 -0.6690746  0.026430342
#>  [51,]  0.317804030 -0.118852629 -0.6592795  0.039258957
#>  [52,]  0.303851544 -0.137495885 -0.6493560  0.052482145
#>  [53,]  0.289578216 -0.156683930 -0.6393042  0.066079934
#>  [54,]  0.275019563 -0.176277700 -0.6291247  0.080032354
#>  [55,]  0.260211103 -0.196138127 -0.6188177  0.094319433
#>  [56,]  0.245188352 -0.216126146 -0.6083835  0.108921202
#>  [57,]  0.229986828 -0.236102690 -0.5978225  0.123817689
#>  [58,]  0.214642047 -0.255928694 -0.5871351  0.138988923
#>  [59,]  0.199189527           NA -0.5763216  0.154414934
#>  [60,]  0.183664786           NA -0.5653823  0.170075751
#>  [61,]  0.168103340           NA -0.5543176  0.185951404
#>  [62,]  0.152540707           NA -0.5431278  0.202021921
#>  [63,]  0.137012404 -0.347933291 -0.5318133  0.218267331
#>  [64,]  0.121553947 -0.363935663         NA  0.234667665
#>  [65,]  0.106200441 -0.378817878         NA  0.251203010
#>  [66,]  0.090977470 -0.392533154         NA  0.267854831
#>  [67,]  0.075901101 -0.405123151         NA  0.284605965
#>  [68,]  0.060986986 -0.416633375         NA  0.301439310
#>  [69,]           NA -0.427109330 -0.4616805  0.318337764
#>  [70,]           NA -0.436596522 -0.4497954  0.335284224
#>  [71,]           NA -0.445140455 -0.4379169  0.352261589
#>  [72,]           NA -0.452786635 -0.4260686  0.369252755
#>  [73,]           NA -0.459580566 -0.4142743  0.386240622
#>  [74,]           NA -0.465567755 -0.4025576  0.403208086
#>  [75,]           NA -0.470793705 -0.3909423  0.420138045
#>  [76,]           NA -0.475303921 -0.3794522  0.437013397
#>  [77,]           NA -0.479143910 -0.3681109  0.453817040
#>  [78,]           NA -0.482359176 -0.3569421  0.470531872
#>  [79,]           NA -0.484995223 -0.3459695  0.487140789
#>  [80,]           NA -0.487097558 -0.3352169  0.503626691
#>  [81,]           NA -0.488711684 -0.3247080  0.519972474
#>  [82,]           NA -0.489883108 -0.3144664  0.536161036
#>  [83,]           NA -0.490657334 -0.3045160  0.552175276
#>  [84,]           NA -0.491079868 -0.2948804  0.567998090
#>  [85,]           NA -0.491196213 -0.2855833  0.583612377
#>  [86,]           NA -0.491051876 -0.2766484  0.599001035
#>  [87,]           NA -0.490692362 -0.2680995  0.614146960
#>  [88,] -0.179116809 -0.490163175 -0.2599603  0.629033051
#>  [89,] -0.187007820 -0.489509820 -0.2522544  0.643642206
#>  [90,] -0.194392218 -0.488777803 -0.2450057  0.657957322
#>  [91,] -0.201254351 -0.488012629 -0.2382377  0.671961297
#>  [92,] -0.207578565 -0.487259802 -0.2319743  0.685637029
#>  [93,]           NA -0.486564828 -0.2262392  0.698967415
#>  [94,]           NA -0.485973212 -0.2210559  0.711935354
#>  [95,] -0.223167175 -0.485530459 -0.2164484  0.724523742
#>  [96,] -0.227183192 -0.485282074 -0.2124402  0.736715478
#>  [97,] -0.230583027 -0.485273561 -0.2090551  0.748493460
#>  [98,] -0.233351029 -0.485550427 -0.2063168  0.759840585
#>  [99,] -0.235471543 -0.486158176 -0.2042491  0.770739750
#> [100,] -0.236928919 -0.487142312 -0.2028756  0.781173855
#> [101,] -0.237707503 -0.488548343 -0.2022200  0.791125796
#> [102,] -0.237791643 -0.490421771 -0.2023061  0.800578470
#> [103,] -0.237165685 -0.492808102 -0.2031576  0.809514777
#> [104,] -0.235813977 -0.495752842 -0.2047981  0.817917613
#> [105,] -0.233720868 -0.499301496 -0.2072515  0.825769877
#> [106,] -0.230870703 -0.503499567 -0.2105414  0.833054465
#> [107,] -0.227247830 -0.508392563 -0.2146915  0.839754277
#> [108,] -0.222836597 -0.514025986 -0.2197256  0.845852209
#> [109,] -0.217621352 -0.520445344 -0.2256673  0.851331159
#> [110,] -0.211586440 -0.527696140 -0.2325404  0.856174025
#> [111,] -0.204716210 -0.535823880 -0.2403686  0.860363705
#> [112,] -0.196995009 -0.544874068 -0.2491756  0.863883097
#> [113,] -0.188407185 -0.554892211 -0.2589851  0.866715097
#> [114,] -0.178937084 -0.565923812 -0.2698208  0.868842605
#> [115,] -0.168569055 -0.578014377 -0.2817065  0.870248517
#> [116,] -0.157287444 -0.591209412 -0.2946658  0.870915732
#> [117,] -0.145076599 -0.605554420 -0.3087225  0.870827147
#> [118,] -0.131920867 -0.621094907 -0.3239003  0.869965660
#> [119,] -0.117804595 -0.637876379 -0.3402229  0.868314168
#> [120,] -0.102712132 -0.655944340 -0.3577139  0.865855570
#> [121,] -0.086627823 -0.675344295 -0.3763973  0.862572763
#> [122,] -0.069536017 -0.696121749 -0.3962965  0.858448644
#> [123,] -0.051421061 -0.718322208 -0.4174354  0.853466113
#> [124,] -0.032267303 -0.741991176 -0.4398377  0.847608065
#> [125,] -0.012059088 -0.767174159 -0.4635271  0.840857400
#> [126,]  0.009219234 -0.793916661 -0.4885272  0.833197015
#> [127,]  0.031583316 -0.822264188 -0.5148619  0.824609806
#> [128,]  0.055048813 -0.852262244 -0.5425548  0.815078674
#>