install.packages("devtools") # if the package is not yet installed
::install_github("nyilin/SamplingDesignTools") devtools
SamplingDesignTools: Tools for Dealing with Complex Sampling Designs
Getting Started
Installation
Install the SamplingDesignTools package from GitHub (package devtools needed):
Load package:
library(SamplingDesignTools)
library(survival)
library(Epi) # To draw (non-counter-matched) nested case-control sample
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(knitr)
Example Datasets
This package uses two simulated cohort data (cohort_1
and cohort_2
) for illustrative purpose.
cohort_1
Dataset cohort_1
consists of 10,000 subjects with age simulated from \(N(55, 10^2)\)) and gender simulated with \(P(\text{Male}=0.5)\). The survival outcome simulated from the following true hazard: \[\log \{h(t)\} = \log \{h_0\} + \log(1.1) \text{Age} + \log(2) \text{Gender}.\]
Time (\(t\)) is measured in years and censored at 25 years. Censoring is indicated by \(y=0\).
data("cohort_1")
dim(cohort_1)
## [1] 10000 5
table(cohort_1$y)
##
## 0 1
## 9418 582
kable(head(cohort_1))
id | y | t | age | gender |
---|---|---|---|---|
1 | 0 | 25.00000 | 47 | 1 |
2 | 0 | 10.65152 | 58 | 0 |
3 | 0 | 25.00000 | 46 | 0 |
4 | 0 | 15.84131 | 52 | 0 |
5 | 0 | 22.57659 | 49 | 0 |
6 | 0 | 25.00000 | 63 | 0 |
<- coxph(Surv(t, y) ~ age + gender, data = cohort_1)
m_cox_cohort_1 summary(m_cox_cohort_1)
## Call:
## coxph(formula = Surv(t, y) ~ age + gender, data = cohort_1)
##
## n= 10000, number of events= 582
##
## coef exp(coef) se(coef) z Pr(>|z|)
## age 0.100340 1.105546 0.004213 23.820 <2e-16 ***
## gender 0.781068 2.183804 0.087990 8.877 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## age 1.106 0.9045 1.096 1.115
## gender 2.184 0.4579 1.838 2.595
##
## Concordance= 0.784 (se = 0.009 )
## Likelihood ratio test= 643.4 on 2 df, p=<2e-16
## Wald test = 643.8 on 2 df, p=<2e-16
## Score (logrank) test = 642.4 on 2 df, p=<2e-16
cohort_2
Dataset cohort_2
consists of 100,000 subjects, with survival outcome simulated from the following true hazard: \[\log \{h(t)\} = \log \{h_0\} + \log(1.5)x + \log(4)z + \log(2)xz +
\log(1.01) \text{Gender} + \log(1.01) \text{Age}.\]
Time (\(t\)) is measured in years and censored at 25 years. Censoring is indicated by \(y=0\). Age is also recorded in 6 categories: <35, 36-45, 46-55, 56-65, 66-75 and >75.
data("cohort_2")
dim(cohort_2)
## [1] 100000 8
table(cohort_2$y)
##
## 0 1
## 97227 2773
kable(head(cohort_2))
id | y | t | x | age | age_cat | gender | z |
---|---|---|---|---|---|---|---|
1 | 0 | 25.000000 | 1 | -2 | (45,55] | 0 | 0 |
2 | 0 | 19.819801 | 1 | -4 | (45,55] | 1 | 0 |
3 | 0 | 25.000000 | 1 | -5 | (45,55] | 0 | 0 |
4 | 0 | 12.414616 | 1 | 20 | (75, Inf] | 1 | 0 |
5 | 0 | 25.000000 | 1 | -2 | (45,55] | 0 | 1 |
6 | 0 | 1.019023 | 0 | -15 | (35,45] | 1 | 0 |
<- coxph(Surv(t, y) ~ x * z + age + gender, data = cohort_2)
m_cox_cohort_2 summary(m_cox_cohort_2)
## Call:
## coxph(formula = Surv(t, y) ~ x * z + age + gender, data = cohort_2)
##
## n= 100000, number of events= 2773
##
## coef exp(coef) se(coef) z Pr(>|z|)
## x 0.382501 1.465946 0.109906 3.480 0.000501 ***
## z 1.495078 4.459686 0.126331 11.835 < 2e-16 ***
## age 0.007139 1.007165 0.001898 3.762 0.000169 ***
## gender -0.086074 0.917526 0.038010 -2.265 0.023542 *
## x:z 0.640698 1.897805 0.135081 4.743 2.11e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## x 1.4659 0.6822 1.1819 1.8183
## z 4.4597 0.2242 3.4815 5.7127
## age 1.0072 0.9929 1.0034 1.0109
## gender 0.9175 1.0899 0.8517 0.9885
## x:z 1.8978 0.5269 1.4564 2.4730
##
## Concordance= 0.762 (se = 0.005 )
## Likelihood ratio test= 2907 on 5 df, p=<2e-16
## Wald test = 2435 on 5 df, p=<2e-16
## Score (logrank) test = 3528 on 5 df, p=<2e-16