The Development of Web-based Graphical User Interface for Learning and Fitting Generalized Estimating Equation with Spline Smoothers

Statistical modeling (regression analyses) have been growing rapidly into various directions to accommodate various data conditions. For longitudinal or repeated measures data, one of the suitable models is GEE (Generalized Estimating Equation). In practice, to do complex modeling such as GEE, the use of statistical software is necessary and it is available on free open source software R. However, GEE modeling on R can only be access through command line interface (CLI), and most practical researchers very much rely on Graphical User Interface (GUI) based statistics software. To make access to GEE (both order 1 and 2) much easier, we developed, using Shiny toolkit, two types of web-based GUI, standard pull down menu type and e-module type (with narrative theories) that can be utilized for learning and fitting GEE. This paper discusses the features of the interfaces and illustrates the use of them.


INTRODUCTION
Statistical models have been developed into various directions to accommodate the objectives of researchers and the complexity of data (in term of sample size, the number, types and structure of variables involved).
In practice, most applications of the models need statistics softwares.Majority of practical researches can only access models those are already implemented in commonly available GUIbased softwares, while actually, statistics theories grows much more rapidly than those readily implemeted in those type of softwares.This situation worries statisticians (Wallace et al., 2012)

that "practitioners continue to use inappropriate or suboptimal methods due to their being restricted to what is made available via GUIs".
New statistical methods are most frequently implemented or tested using open source R, and R has rich collections of the most recent (currently developed) or advanced statistical methods.However, all of them (including GEE) are still based on script or CLI, hence they cannot be utilized by majority of 'practitioners'.For ordinary and occasional R users, even installing R with extended specific packages are also not simple.
The need for developing GUI for R, were already recognized arround last decade and several GUIs have been developed such as Rcommander which covers most basic statistics (Fox, 2005), RKWard (Rodiger et al., 2012), andDeducer (Fellows, 2012).R Commander and Deducer have plug-in system to facilitate other programmers to extend the coverage.
However, GEE has yet not been implemented either in the core or as plug-ins of the above GUIs.
The development of Web-based GUI has becoming easier since the release of shiny framework (Chang et al. 2015).Such framework has been implemented for unifying regression analysis (in pulldown menu type) for independent/ univariate responses (Tirta, et al. 2017).In this paper we develop web-GUI and in e-tutorial/e-module type.for easier, friendlier access and guided application of GEE (order 1 and order 2) both in pull down menus and e-module types.In addition we extend the GEE by combining with spline smoother on the mean model.
The rest of the paper is organized as follows: section II briefly summarizes theory of GEEs, section III describes the main methods or tools, section IV discusses the results and numerical illustration and section V concludes the results of the study.

REVIEW OF GEE GEE Order 1
Original GEE (later known as GEE 1 or GEE order 1) was formally introduced by Liang and Zeger and Zeger and Liang in 1986.GEE can be considered as extension of GLM (Generalized Linear Model, McCullagh & Nelder, 1989) to account for correlation in the responses.These type of responses are commonly found in repeated meassurement or longitudinal data.The type of distributions of data considered in this model are mainly those from exponential family distribution with corresponding valid link functions (McCullagh & Nelder, 1989).The mean responses are also modeled similarly to GLM with appropriate link function, g, as given in equation (1).
for p number of predictors.
The most common correlation structures available for GEE are: (a) exchangeable (uniform) with 1 correlation parameter (cp), (b) AR (auto regressive)-1 with 1 cp, (c).Independence no cp (identic with GLM) and (d) unstructured with (k 2 -k)/2 cp's, for responses with k cluster/ repeated measurement.The form of the correlation matrices are as given in equation 2(a)-2(d). (a)

GEE Order 2
In GEE (order 1), the scale parameter  is considered as constant.GEE order 2 extends the GEE with possibility of modeling the scale parameter to be dependant upon some predictors (in our case are limited to linear predictors).That is, in addition to model for mean, there is also model for scale parameter as given in equation ( 3 As being well known, that misspesification of correlation structure in GEE 1, does not influence the point estimates of the regression parameter β, however it does influence their efficiency as indicated by their standard error (Hidayati et al.).Therefore it is still important to check the best correlation structure as well as the model for the scale parameter.In addition to the model of scale parameter, GEE2 also allows the correlation to have one of two available link functions, these are identity and Fizsher.
Considering Smoother for the mean model Modeling regression parameter as linear with composition of link function, sometime is not sufficient to account for nonlinearlity in the predictors.In other words, there is a need to have model that can be considerd as extention of GAM for GEE.Recently, there are many sugesstion in the discussion forum to include natural or b-spline smoother in the model.This is possible since natural and b-spline are quite flexible in R to be combined with other model (such as classes of lm() and glm(), and gee ()).

METHODS
The web-based interfaces are built using shiny toolkit applied on R-shiny server from Rstudio group (Chang et al., 2015).We develop two types of web-GUI interfaces: standard pull down menu type and e-tutorial/ e-module type for modeling longitudinal data, using GEE.The pulldown menu type consst of 2 main file, ui.r (for communication with user) and server.r(for communication with R server), while for e-module type the ui.r is replaced by index.html(where in this html file we can put several extension such as theory, step by step procedure by utilizing html features).Therefore, in addition to computational features (as in pulldown menu type), the e-module type has description of theory and narration for guiding users in every step of modeling and interpreting the outputs.

RESULTS AND DISCUSSION
The developed interfaces are already uploaded on our 'virtual statistics laboratory'.The pull down menus type can be accessed at http://statslab-rshiny.fmipa.unej.ac.id/RProg/MSD/ (See

Data Ilustration
As illustration, we simulated data having both continuous (SHB) and binary responses (BS) which are measurred repeatedly (in the form of five Tests,T1-T5).To calculate the correlation matrix and to display correlation diagram (among the repeated responses), the name for id and for repeated observations be properly chosen.The following is the correlation matrix and correlation diagram (Fig 3) between repeated observations.This correlation matrix or diagram can give rough idea on the candidate of the suitable type of correlation structure (in this case AR-1 seems appropriate, slightly decreases as the test distance further away).Note that on the web, for the same name user may find slightly different data since they are randomly simulated.
T1 T2 T3 T4 T5 T1 1.000 0.951 0.834 0.793 0.785 T2 0.951 1.000 0.905 0.858 0.818 T3 0.834 0.905 1.000 0.943 0.891 T4 0.793 0.858 0.943 1.000 0.943 T5 0.785 0.818 0.891 0.943 1.000 We check the possibility of including smoother in the mean model and inspect the scatter plot of some predictors vs response.After comparing the graphs, we see thatvariable STPA with df =3 seems worth to include in the mean model (Fig 4).For distribution, link and correlation structure, we can easily select the menus for these choices (see Fig 2).The interpretation of the outputs is as usual by checking the p-value (for the significance of the parameter) and the value of QIC for checking the goodness of fit of the model.In this illustration, all the regression coefficient (for both mean and scale parameter) are significant (p-val<5%).User can modify (change) the model (family, correlation structure etc) and compare the value of the QIC (Note that model for scale parameter does not affect QIC, so for scale model we only check their p-values).The results for some tested models are summarized in Table 2, follows by the full output of what is considered as the best model.In modelling with GEE, a predictor cannot be at the same time as linear and nonparametric predictor (in our interface using natural spline).In practice, for prediction purposes, the repeated measurement (in this illustration is Test) may be retained in the model although their parameters are not significant especially where prediction for every level of the repeated factor are needed.(Tirta, et al) The final model can be expressed as follows and scale=177.97+3.1933*SUNOverall, the response Y has gaussian distribution with identity link and among the repeated observations have exchangeable type of correlation structure with estimated correlation parameters equal 0.78.

Discussion
Our web GUI interfaces fill the gap of the absence of GUI for GEE (both order 1 and 2) in R. In term of available of choices of components, our web-GUI offers all of the common choices and the output of geese() function (ready for modeling GEE1 and GEE2).In addition (which are not part of geepack package) we also offer (ii) correlation matrix and correlation diagram, (iii) smoother model and graphics, and (iv) QIC as guidance for choosing a better model.Users can utilized those option without worrying about the syntax of all used functions, or even about installing them on their computer.The e-module type not only describe summary of theory (as found in commonly available e-module, such as one of popular online text book on statistics) (StatSoft, Inc., 2013), but also present proper mathematics notation and offer the users to dynamically choose different data (including their own data) and follow step by step checking various models and interpretation of the output.Limitation and future development Our web-GUI has some limitation including: (i) responses distributions are limited to exponential family distributions, (ii) model for scale is limited to linear model with identity link, (iii) QIC does not count for model for scale.We have recently initiated extending the interfaces to include GEE for nominal or ordinal multinomial responses through multgee package (Touloumis, 2015) and possibly vector generalized linear and additive model (VGAM) (Yee, 2016) and these features will be properly added in near future.For e-module type the structure of description of the theory and the step of modelling need to be continuously improved by considering user's feedback.

CONCLUSION
Our web-based GUI on GEE can be accessed with various type of gadgets and browser without worrying about scripting, collecting and installing R and related packages.Our web-GUI offer all important options of geese() as if it is accessed via script.In addition, the web also calculate and offer sample correlation and correlation diagram, and QIC of each chosen model.User are also possible to accomodate smoother (using natural cubic spline) in modeling mean responses.

Figure 2 .
Figure 1.The appearance of the pulldown menu type.User can easily chooseariables, type of distributions, correlation structure

Figure 3 .Figure 4 .
Figure 3. Correlation Diagram of the Repeated Responses

Table 1 .
Main computational features of the Interfaces

Table 2 .
Comparison of the QIC values for some tested models