Nuevos modelos predictivos de enfermedad … · Nuevos modelos predictivos de enfermedad...

NUEVOS MODELOS PREDICTIVOS DE ENFERMEDAD CARDIOVASCULAR

Antonio Palazón Bru

www.ua.es

www.eltallerdigital.com

Nuevos modelos predictivos de enfermedad cardiovascular.

1

UNIVERSIDAD DE ALICANTE

Departamento de Matemática Aplicada

Escuela Politécnica Superior

NUEVOS MODELOS PREDICTIVOS DE ENFERMEDAD CARDIOVASCULAR

Antonio Palazón Bru

Programa de Doctorado en Métodos Matemáticos y Modelización en

Ciencias e Ingeniería

Tesis presentada para aspirar al grado de

DOCTOR/DOCTORA POR LA UNIVERSIDAD DE ALICANTE

DIRIGIDA POR LOS PROFESORES:

Dra. Dña. Isabel Vigo Aguiar Dr. D. Vicente Francisco Gil Guillén

Catedrático de Escuela Universitaria Profesor Titular de Universidad


2


3

A mis padres y a mi abuelo por su apoyo y cariño incondicional en este proyecto.


4


5

A mis directores de tesis por su colaboración en la realización de esta tesis doctoral.

Al profesor doctor D. Ramón Ángel Durazo por sus enseñanzas en la predicción de enfermedades cardiovasculares.

Al profesor doctor D. Julio Antonio Carbayo Herencia, por su valiosa ayuda en todas las publicaciones pertenecientes a esta tesis doctoral.

A D. Ian Jonstone, por su ayuda a lo largo de estos años de trabajo, ya que sus comentarios y consejos han contribuido de forma sustancial en mi carrera profesional, además de su apoyo lingüístico en los artículos

científicos integrados en esta tesis doctoral.

Al grupo GEVA por permitir hacerme partícipe del proyecto para construir la escala de riesgo cardiovascular.

A Dña. Maria Repice por su contribución en la traducción de los artículos científicos presentados en esta tesis doctoral.


6


7

“Ninguna investigación humana puede ser denominada ciencia si no pasa

a través de pruebas matemáticasi Leonardo Da Vinci

i This image (or other media file) is in the public domain because its copyright has expired. This applies to Australia, the European Union and those countries with a copyright term of life of the author plus 70 years.


8


9

ÍNDICE

1. SÍNTESIS DE LAS PUBLICACIONES PRESENTADAS........................... 11

1.1. Antecedentes y estado actual del tema. ............................................ 13

1.2. Justificación. ...................................................................................... 16

1.3. Objetivos. ........................................................................................... 16

1.4. Artículos presentados. ....................................................................... 17

1.5. Métodos. ............................................................................................ 18

1.5.1. Modelos básicos empleados en las mejoras propuestas. ............. 18

1.5.2. Predicciones a corto, medio y largo plazo. ................................... 23

1.5.3. Determinación de la mejor combinación de variables para predecir

un evento cardiovascular. ......................................................................... 26

1.5.4. Construcción, validación y utilización de un sistema de puntos con

medidas repetidas de los factores de riesgo cardiovascular. .................... 29

1.6. Resultados. ........................................................................................ 33

1.6.1. Predicciones a corto, medio y largo plazo. ................................... 33

1.6.2. Determinación de la mejor combinación de variables para predecir

un evento cardiovascular. ......................................................................... 38

1.6.3. Construcción, validación y utilización de un sistema de puntos con

medidas repetidas de los factores de riesgo cardiovascular. .................... 38

1.7. Discusión. .......................................................................................... 56

1.8. Referencias. ....................................................................................... 60

2. TRABAJOS PUBLICADOS. ...................................................................... 67

3. CONCLUSIONES. ...................................................................................107


10


11

1. SÍNTESIS DE LAS PUBLICACIONES PRESENTADAS.


12


13

1.1. Antecedentes y estado actual del tema.

Puesto que las enfermedades cardiovasculares en la actualidad constituyen

una de las principales causas de mortalidad en el mundo (WHO, 2014), es

lógico que sea de gran interés el desarrollo de modelos de predicción con el

objetivo de conocer sobre qué factores de riesgo podemos intervenir para

disminuir la probabilidad de que un paciente desarrolle una enfermedad

cardiovascular (Molinero, 2003).

El modelo más sencillo para realizar predicciones de un suceso dicotómico,

como es la enfermedad cardiovascular, es el modelo de regresión logística

binaria (Hosmer & Lemeshow, 2000), que produce una ecuación en la que

conocidos los valores de los diferentes factores de riesgo, se puede evaluar la

probabilidad de aparición de la enfermedad. Sin embargo, este tipo de modelos

no tiene en cuenta el tiempo de exposición. Esto es precisamente lo que se

hace en los modelos de supervivencia, los cuales analizan el tiempo de

ocurrencia de un evento determinado, siendo el método más conocido el

modelo de regresión de Cox (Hosmer & Lemeshow, 2008). Sin embargo, no es

la única alternativa posible, existiendo otros posibles métodos de

supervivencia, denominados paramétricos, debido a que suponen un tipo

concreto de distribución, como la de Weibull, que fue utilizada por el proyecto

SCORE (Conroy et al., 2003). Por otra parte, el estudio Framingham (National

Heart, Lung, and Blood Institute, 2015) ha utilizado tanto modelos de regresión

logística como modelos de supervivencia (paramétricos y no paramétricos).

Como referencia se utiliza en Europa la tabla de riesgo SCORE y en Estados

Unidos la del estudio Framingham. Estas tablas de riesgo están basadas en un

tiempo de seguimiento de 10 años, por lo que no permiten obtener predicciones

precisas a corto o medio plazo (Cooney, Dudina & Graham, 2009). Ésta es una

cuestión clínicamente relevante, ya que si conocemos que un paciente tiene

una alta probabilidad de desarrollar una enfermedad cardiovascular a corto

plazo (por ejemplo 2 años), se debería someter a una terapia farmacológica y

no farmacológica más intensa, ya que de no hacerlo es posible que el paciente

experimente una enfermedad cardiovascular.


14

Junto con los modelos predictivos de Framigham y SCORE, se han

desarrollado otros modelos que también se utilizan en la práctica clínica en

menor medida, como por ejemplo el Reynolds risk score and the WHO/ISH

score (Cooney, Dudina & Graham, 2009). Todos estos modelos tienen un

objetivo común: la realización de predicciones de enfermedad cardiovascular

en un periodo de 10 años, pero en la modelización del problema utilizan

diferentes modelos matemáticos (Cox y Weibull) y consideran diferentes

outcomes (morbimortalidad de enfermedades coronarias, mortalidad de

enfermedades coronarias, morbimortalidad cardiovascular o mortalidad

cardiovascular). Esto permite tomar decisiones a largo plazo para los

pacientes. Finalmente, destacamos que las guías de práctica clínica

recomiendan la utilización de estos modelos predictivos para la estratificación

del riesgo cardiovascular de los pacientes. Por ejemplo, la guía clínica europea

sobre la prevención de las enfermedades cardiovasculares, indica que un

sistema de estimación del riesgo, como SCORE, puede ayudar a la toma de

decisiones y de esa forma evitar tanto el exceso como el defecto del

tratamiento (Perk et al., 2012). En otras palabras, los clínicos siguen las guías

clínicas para mejorar su toma de decisiones con el objetivo de prevenir una

enfermedad cardiovascular, y son estas guías las que indican la utilización de

estos modelos predictivos, lo que produce que tengan gran relevancia en la

práctica clínica habitual.

En lo referente a la elección de las variables incluidas en los modelos

predictivos, generalmente los estudios cardiovasculares han recogido multitud

de variables que podrían influir en el pronóstico de cada paciente. Sin embargo,

en la práctica no todas ellas pueden ser incluidas en el modelo, ya que

aumentaría su complejidad y podría no existir la convergencia en la estimación

de sus parámetros (Hosmer & Lemeshow, 2008). Por ese motivo, la mayoría de

estudios han realizado modelos por pasos basados en algún test estadístico,

como puede ser el likelihood ratio test o el score test. Sin embargo, en estos

pasos es posible que obviemos alguna combinación de variables que produzca

un mejor pronóstico de la enfermedad cardiovascular, ya que podría ocurrir que

una variable por sí sola no fuera estadísticamente significativa, pero cuando se


15

combina con otro grupo de variables la predicción de la enfermedad

cardiovascular mejorase (López-Bru et al., 2015; Ramírez-Prado et al., 2015).

Dada la complejidad de estos modelos matemáticos, se ha aplicado un

algoritmo que permite al clínico comprenderlo de forma sencilla, aunque se

pierda precisión en la estimación de la probabilidad de enfermedad

cardiovascular (Sullivan, Massaro & D'Agostino, 2004). Para ello se han

transformado dichos modelos matemáticos en tablas de riesgo de colores que

pueden ser empleadas en la práctica clínica de forma sistemática. No obstante,

dichas tablas están basadas en modelos que manejan variables clínicas en la

situación basal del paciente (Conroy et al., 2003; Cooney, Dudina & Graham,

2009; National Heart, Lung, and Blood Institute, 2015), por lo que no tienen en

cuenta la variabilidad de las mismas a lo largo del tiempo, ya que los

parámetros biológicos están siendo considerados constantes a lo largo del

tiempo de seguimiento, cuando tienen gran variabilidad y el facultativo puede

intervenir mediante tratamiento farmacológico y disminuirlos o aumentarlos de

forma acelerada (NCEP, 2002; American Diabetes Association , 2014; James

et al., 2014; Stone et al., 2014).

Existen modelos predictivos de supervivencia de otro tipo de enfermedades,

que sí tienen en cuenta la variabilidad temporal de un único marcador biológico,

además de variables basales. Éstos son conocidos como Joint Models for

Longitudinal and Time-to-Event Data y están estructurados en dos partes: (i) Se

aplica un modelo lineal mixto para determinar la trayectoria del parámetro

longitudinal y (ii) a partir de un modelo de supervivencia se relacionan las

variables basales y el parámetro longitudinal con la aparición de un evento.

Este tipo de modelos pueden aplicarse para realizar predicciones más precisas

del desarrollo de una enfermedad (Rizopoulos, 2012). Sin embargo, dada su

complejidad, no se emplean en la práctica clínica habitual. Además, la

modelización conjunta cuando la parte de supervivencia está formada por una

función lineal con múltiples parámetros longitudinales (modelización habitual en

el análisis de supervivencia clásico aplicado a las ciencias de la salud), sólo se

ha abordado de forma teórica y en la actualidad sigue siendo un completo

desafío computacional, lo que no ha permitido que se desarrollen algoritmos

para realizar predicciones como en el caso univariado (Rizopoulos, 2011).


16

1.2. Justificación.

Es necesario dotar a los modelos predictivos de enfermedad cardiovascular ya

existentes de mejoras que puedan ser de gran utilidad para los profesionales

sanitarios, de forma que se mejore el pronóstico del paciente y no se pierda la

sencillez de los sistemas de puntos, tan empleados en la práctica clínica

habitual. Estas mejoras deberán aplicarse tanto a los conjuntos de datos que

sólo contengan información en la situación basal, como a aquéllos que

contengan medidas repetidas de los factores de riesgo cardiovascular a lo largo

del tiempo.

1.3. Objetivos.

El objetivo general es mejorar los modelos predictivos de enfermedad

cardiovascular que se utilizan en la actualidad en la práctica clínica habitual.

Para ello se plantean los siguientes objetivos específicos:

1. Detallar una metodología que permita construir y validar un modelo

predictivo de enfermedad cardiovascular con sólo información basal,

para obtener predicciones a corto, medio y largo plazo, para reducir la

incidencia de enfermedad cardiovascular en la población, ya que hemos

de tener en cuenta que los sistemas de puntos actuales no permiten

obtener probabilidades a corto o medio plazo.

2. Detallar cómo seleccionar las variables explicativas de un modelo

predictivo de enfermedad cardiovascular, teniendo en cuenta todas las

posibles combinaciones de variables que pueda soportar el modelo que

se construye.

3. Detallar una metodología para construir modelos predictivos de

enfermedad cardiovascular que puedan aplicarse de forma sistemática


17

en la práctica clínica habitual (sin perder los sistemas de puntos, tan

utilizados en la actualidad) y que determine con la mayor precisión

(teniendo en cuenta la variabilidad temporal de todos los factores de

riesgo cardiovascular) el riesgo de desarrollo de la misma.

1.4. Artículos presentados.

Los objetivos planteados han sido desarrollados dando lugar a tres

publicaciones científicas en revistas de alto de impacto del JCR (Science Ed.):

1. Artigao-Ródenas LM, Carbayo-Herencia JA, Palazón-Bru A, Divisón-

Garrote JA, Sanchis-Domènech C, Vigo-Aguiar I, Gil-Guillén VF; GEVA

(Group of Vascular Diseases from Albacete). 2015. Construction and

validation of a 14-year cardiovascular risk score for use in the general

population: the PURAS-GEVA chart. Medicine (Baltimore) 94: e1980.

Medicine es una revista científica que aborda todas las especialidades y

subespecialidades médicas con un factor de impacto de 5.723 (JCR

2014), situada en la posición número 15 de un total de 153 en la

categoría de MEDICINE, GENERAL & INTERNAL.

2. Ramírez-Prado D, Palazón-Bru A, Folgado-de la Rosa DM, Carbonell-

Torregrosa MÁ, Martínez-Díaz AM, Martínez-St John DR, Gil-Guillén VF.

2015. A four-year cardiovascular risk score for type 2 diabetic inpatients.

PeerJ 3: e984.

3. Palazón-Bru A, Carbayo-Herencia JA, Vigo-Aguiar I, Gil-Guillén VF.

2016. A method to construct a points system to predict cardiovascular

disease considering repeated measures of risk factors. PeerJ 4: e1673.


18

PeerJ es una revista científica con un factor de impacto de 2.112,

situada en la posición decimotercera en MULTIDISCIPLINARY

SCIENCES, la cual engloba a un total de 57 revistas científicas.

1.5. Métodos.

En esta sección detallaremos las mejoras propuestas para la creación de

nuevos modelos predictivos de enfermedad cardiovascular, pero antes de ello

se realizará una síntesis de los modelos básicos empleados en dichas mejoras

(modelo de supervivencia de Cox con variables tiempo-dependientes, cómo

elaborar un sistema de puntos con la metodología del Framingham Heart

Study, Joint Models for Longitudinal and Time-to-Event Data, y predicciones

dinámicas utilizando Joint Models).

1.5.1. Modelos básicos empleados en las mejoras propuestas. Modelo de supervivencia de Cox con variables tiempo dependientes

Sea 𝑇 una variable aleatoria no negativa denotando el tiempo de supervivencia,

que es el mínimo valor entre el verdadero tiempo hasta el evento 𝑇∗ y el tiempo

de censura 𝐶 (censura no informativa por la derecha). En otras palabras,

𝑇 = min(𝑇∗,𝐶). Además, definimos 𝛿 como el indicador del evento, que toma el

valor 1 si 𝑇∗ ≤ 𝐶 y 0 en otro caso. Por otro lado, sea 𝑾 el vector de covariables

basales e 𝒀(𝑡) el vector de covariables tiempo-dependientes que tiene valores

definidos siempre para 𝑡 ≥ 0. Con estos datos, el modelo de Cox con

covariables tiempo-dependientes toma la siguiente forma (función de riesgo):

ℎ�𝑡|𝒘,𝒚(𝑡)� = ℎ0(𝑡) exp{𝜸𝑇𝒘 + 𝜶𝑇𝒚(𝑡)},

donde ℎ0(𝑡) es la función de riesgo en la situación basal, 𝜸 y 𝜶 son los vectores

de los coeficientes de la regresión para las covariables basales y tiempo

dependientes respectivamente (Andersen & Hill, 1982).


19

La estimación de los parámetros del modelo está basada en la función de

verosimilitud parcial (Andersen & Hill, 1982). Por otro lado, tenemos que

corroborar si la forma funcional de cada una de las covariables en el modelo es

lineal. Esto deberá hacerse mediante métodos gráficos (gráfico cartesiano

enfrentando a los residuos de Martingala frente a la covariable de interés).

Finalmente, debemos valorar si el modelo ajusta nuestros datos de forma

correcta, a través del análisis de los residuos de Cox-Snell (test gráfico).

El modelo clásico de Cox (sin variables tiempo-dependientes), elimina 𝜶 e 𝒚(𝑡)

de la expresión anterior. Además, el modelo tiene que verificar la siguiente

condición (hipótesis de riesgos proporcionales):

log�ℎ(𝑡|𝒘)ℎ0(𝑡)

� =𝜸𝑇𝒘.

Elaboración de un sistema de puntos desde un modelo de supervivencia de

Cox mediante la metodología del Framingham Heart Study

Vamos a resumir los pasos principales para adaptar un modelo de regresión de

Cox con 𝑝 covariables a sistema de puntos, mediante la metodología elaborada

por los investigadores del Framingham Heart Study (Sullivan, Massaro &

D'Agostino, 2004):

1) Estimar los parámetros del modelo: 𝜸�.

2) Organizar los factores de riesgo en categorías y determinar valores de

referencia:

a. Factor de riesgo continuo (ejemplo: la edad): realizar

agrupamientos contiguos del factor y determinar valores de

referencia para cada agrupamiento. Ejemplo para la edad: 18-30

[24], 30-39 [34.5], 40-49 [44.5], 50-59 [54.5], 60-69 [64.5] y ≥70

años [74.5]. En corchetes se indican cada uno de los valores de

referencia. Los investigadores del Framingham Heart Study


20

recomiendan el punto medio de cada intervalo como un valor

adecuado para la referencia, y para la primera y última categoría

la media entre el valor extremo y el percentil 1 (primera categoría)

o el percentil 99 (última categoría).

b. Factores de riesgo binarios (ejemplo el sexo: 0 mujer y 1 hombre):

los valores de referencia son 0 y 1.

𝑊𝑖𝑗 denotará el valor de referencia para la categoría 𝑗 y el factor de

riesgo 𝑖, donde 𝑖 = 1, . . . ,𝑝 y 𝑗 = 1, . . . , 𝑐𝑖 (número total de categorías para

el factor de riesgo 𝑖).

3) Determinar la categoría base para cada factor de riesgo: la categoría

base será aquélla que otorgue 0 puntos en el sistema y será denotada

por 𝑊𝑖𝑅𝐸𝐹, 𝑖 = 1, … ,𝑝.

4) Determinar cuántas unidades de regresión se separa cada categoría de

la categoría base: calcular 𝜸�𝑖 · �𝑊𝑖𝑗 −𝑊𝑖𝑅𝐸𝐹�, 𝑖 = 1, . . . ,𝑝 y 𝑗 = 1, . . . , 𝑐𝑖.

5) Fijar la constante 𝐵: el número de unidades de regresión equivalentes a

1 punto en el sistema. En el Framingham Heart Study se utiliza el

incremento de 5 (ó 10) años de edad.

6) Determinar el número de puntos para cada categoría de todos los

factores de riesgo: el número entero más cercano a 𝜸�𝑖 · �𝑊𝑖𝑗 −𝑊𝑖𝑅𝐸𝐹�/

𝐵.

7) Determinar el riesgo asociado con la puntuación total (la suma de todas

las puntuaciones asociada a cada uno de los factores de riesgo):

1 − ��0(𝑡)exp�∑ (𝜸�𝑖·𝑊𝑖𝑅𝐸𝐹)𝑝𝑖=1 +𝐵·𝑃𝑢𝑛𝑡𝑜𝑠−∑ 𝜸�𝑖·𝒘�𝚤�

𝑝𝑖=1 �, donde ��0(𝑡) se calcula a

través del estimador de Kaplan-Meier.


21

Joint Models for Longitudinal and Time-to-Event Data

Utilizando la notación anterior, nosotros tenemos el vector de variables

aleatorias {𝑇,𝑾,𝑌(𝑇)}, donde 𝑌(𝑇) es una única variable tiempo-dependiente

(parámetro longitudinal), que tiene sus valores definidos de forma intermitente

para 𝑡. En otras palabras, para un sujeto 𝑖 (𝑖 = 1, . . . ,𝑛), 𝑦(𝑡) está sólo definida

para 𝑡𝑖𝑗 (𝑗 = 1, … ,𝑛𝑖), 𝑦𝑖�𝑡𝑖𝑗�, donde 0 ≤ 𝑡𝑖1 ≤ 𝑡𝑖2 ≤ ⋯ ≤ 𝑡𝑖𝑛𝑖 (𝑡𝑖𝑛𝑖 es el instante

de tiempo máximo donde se ha recogido el parámetro longitudinal para el

sujeto 𝑖). Ahora, vamos a denotar a 𝑚(𝑡) como el valor real y no observado del

parámetro longitudinal en el instante de tiempo 𝑡 (𝑚𝑖(𝑡) para el sujeto 𝑖). Para

valorar el efecto de 𝑚(𝑡) en el riesgo de evento, la opción estándar es ajustar

un modelo de regresión de Cox con una única variable tiempo-dependiente:

ℎ(𝑡|ℳ(𝑡),𝒘) = ℎ0(𝑡∗) exp{𝜸𝑇𝒘 + 𝛼𝑚(𝑡)},

donde ℳ(𝑡) para un sujeto 𝑖 se define como ℳ𝑖(𝑡) = {𝑚𝑖(𝑢); 0 ≤ 𝑢 < 𝑡}, que

denota la historia del verdadero y no observado valor del parámetro longitudinal

hasta el instante de tiempo 𝑡. El resto de parámetros de la expresión sigue la

estructura del modelo de regresión de Cox con variables tiempo-dependientes,

el cual se ha detallado anteriormente. Respecto a la función de riesgo basal,

ésta puede dejarse no especificada o puede ser aproximada mediante splines o

funciones definidas a trozos (Rizopoulos, 2012).

En la expresión anterior, hemos utilizado 𝑚(𝑡) como el verdadero y no

observado valor del parámetro longitudinal, sin embargo en nuestra muestra

hemos recogido 𝑦(𝑡), por lo tanto estimaremos 𝑚(𝑡) utilizando 𝑦(𝑡) a través de

un modelo lineal de efectos mixtos, con el objetivo de describir las evoluciones

del parámetro longitudinal para cada uno de los sujetos:

⎩⎪⎨

⎪⎧ 𝑦𝑖(𝑡) = 𝑚𝑖(𝑡) + 𝜀𝑖(𝑡)𝑚𝑖(𝑡) = 𝒙𝒊𝑇(𝑡)𝜷 + 𝒛𝒊𝑇(𝑡)𝒃𝒊

𝒃𝒊 ∼ 𝑁(𝟎,𝑫)𝜀𝑖(𝑡) ∼ 𝑁(0,𝜎2)

, �


22

donde 𝜷 y 𝒃𝒊 denotan los vectores de coeficientes de la regresión para los

efectos fijos y para los efectos aleatorios respectivamente, 𝒙𝒊(𝑡) y 𝒛𝒊(𝑡) denotan

los vectores fila de las matrices del diseño para los efectos fijos y para los

efectos aleatorios respectivamente, y 𝜀𝑖(𝑡) es el error que tiene una varianza

𝜎2. Finalmente, 𝒃𝒊 sigue una distribución normal con media 𝟎 y matriz de

varianzas-covarianzas 𝑫, e independente de 𝜀𝑖(𝑡) (Rizopoulos, 2012).

La estimación de los parámetros del modelo está basada en la maximización

de la función de verosimilitud del modelo conjunto (Rizopoulos, 2012).

Respecto a los supuestos del modelo, nosotros tenemos que verificarlos para

ambos submodelos (longitudinal y supervivencia), utilizando los gráficos de

residuos. Para la parte longitudinal, nosotros representaremos los residuos

específicos de cada sujeto frente a los valores ajustados por el modelo, el

gráfico Q-Q de los residuos específicos de cada sujeto, y los residuos

marginales frente a los valores ajustados. Por otro lado, para la parte de

supervivencia, nosotros representaremos en un gráfico cartesiano los valores

ajustados de los parámetros longitudinales contra los residuos de Martingala, y

finalmente determinaremos gráficamente si los residuos de Cox-Snell siguen

una distribución (censurada) exponencial con parámetro la unidad (Rizopoulos,

2012). Respecto la última componente del modelo conjunto sobre la que hemos

indicado un supuesto (efectos aleatorios), otros autores han mostrado que los

modelos lineales de efectos mixtos son robustos a la falta de especificación de

esta distribución (Verbeke & Lesaffre, 1997).

Predicciones de los parámetros longitudinales utilizando Joint Models for

Longitudinal and Time-to-Event Data

Sea una muestra aleatoria �𝑡𝑖, 𝛿𝑖,𝒘𝑖 ,𝑦𝑖�𝑡𝑖𝑗�, 0 ≤ 𝑡𝑖𝑗 ≤ 𝑡𝑖, 𝑗 = 1, … ,𝑛𝑖�, 𝑖 = 1, … ,𝑛

del vector de variables aleatorias {𝑇, Δ,𝑾,𝑌}, utilizando la notación anterior.

Supongamos también que hemos ajustado un joint model como el que el que

se ha detallado anteriormente. Ahora, estamos interesados en predecir el valor

esperado del parámetro longitudinal en un instante de tiempo 𝑢 > 𝑡 para un


23

nuevo sujeto 𝑖, que tiene una historia hasta el tiempo 𝑡 del valor observado del

parámetro longitudinal 𝒴𝑖(𝑡) = {𝑦𝑖(𝑠); 0 ≤ 𝑠 < 𝑡}:

𝜔𝑖(𝑢|𝑡) = 𝐸𝑌{𝑦𝑖(𝑢)|𝑡𝑖∗ > 𝑡,𝒴𝑖(𝑡),𝒘𝒊;𝜽},

donde 𝜽 denota el vector de parámetros del joint model (Rizopoulos, 2011).

Rizopoulos desarrolló una aproximación de Monte Carlo para realizar esta

tarea, basada en formulación Bayesiana. Él obtuvo el siguiente algoritmo de

simulación (Rizopoulos, 2011):

Paso 1: Simular 𝜽(𝑙)~𝒩�𝜽�, 𝑣𝑎𝑟� �𝜽��.

Paso 2: Simular 𝒃𝒊(𝑙)~�𝒃𝒊|𝑡𝑖∗ > 𝑡,𝒴𝑖(𝑡),𝒘𝒊;𝜽(𝑙)�.

Paso 3: Calcular 𝜔𝑖(𝑙)(𝑢|𝑡) = 𝒙𝒊𝑇(𝑢)𝜷(𝑙) + 𝒛𝒊𝑇(𝑢)𝒃𝒊

(𝑙)

Este algoritmo deberá ser repetido 𝐿 veces. La estimación del parámetro

longitudinal será la media (o la mediana) de los valores calculados

(𝜔𝑖(𝑙)(𝑢|𝑡), 𝑙 = 1, … , 𝐿) y el intervalo de confianza estará formado por los

percentiles (95%: percentiles 2.5 y 97.5%) (Rizopoulos, 2011).

Nosotros destacamos que estas predicciones tienen naturaleza dinámica, ya

que conforme se registra información del paciente, las predicciones se

actualizan con dicha información.

1.5.2. Predicciones a corto, medio y largo plazo. Una vez hemos adaptado los coeficientes de un modelo de supervivencia a un

sistema de puntos construido mediante la metodología del estudio de

Framingham (Sullivan, Massaro & D'Agostino, 2004), dispondremos de los

siguientes elementos:


24

1) Estimación de los 𝑝 coeficientes asociados a nuestros factores de riesgo

(𝒘) del modelo de regresión: 𝜸�.

2) Valor de ∑ 𝜸�𝑖 · 𝒘�𝚤�𝑝𝑖=1 .

3) Todas las variables explicativas categorizadas. Por ejemplo, en una

variable cuantitativa, como puede ser la edad en años, supongamos que

disponemos de un rango para esta variable de 30-79 años, de forma que

podemos utilizar las categorías de edad 30-39, 40-49, 50-59, 60-69, 70-

79 años. Por otra parte, si un factor está modelado por un conjunto de

variables binarias, como puede ser el ser fumador, la categorización

será: ausente (el sujeto no es fumador) y presente (el sujeto sí que es

fumador).

4) Cada categoría dispone de una puntuación asociada. Por ejemplo, en

las categorías de edad anteriores, se podría dar el siguiente esquema de

puntuaciones:

a. 30-39 años → 0 puntos.

b. 40-49 años → 2 puntos.

c. 50-59 años → 4 puntos.

d. 60-69 años → 6 puntos.

e. 70-79 años → 8 puntos.

Mientras que para la variable del tabaquismo, las puntuaciones podrían

tener la siguiente forma:

a. No fumador → 0 puntos.

b. Fumador → 3 puntos.

Sumando la puntuación asociada a cada categoría, obtenemos la

puntuación total de cada sujeto.


25

5) La estimación de la probabilidad de supervivencia, obtenida a través del

estimador de Kaplan-Meier, para todos los instantes de tiempo recogidos

durante el estudio.

6) La constante 𝐵.

7) Valores de referencia para cada uno de los factores de riesgo: 𝑊𝑖𝑅𝐸𝐹.

A continuación, empleando toda esta información, podemos calcular la

probabilidad de evento para un sujeto con una puntuación P en un instante de

tiempo t, empleando la siguiente expresión:

1 − ��0(𝑡)exp�∑ (𝜸�𝑖·𝑊𝑖𝑅𝐸𝐹)𝑝𝑖=1 +𝐵·𝑃−∑ 𝜸�𝑖·𝒘�𝚤�

𝑝𝑖=1 �

Tanto los investigadores del Framingham Heart Study, como los del proyecto

SCORE, han utilizado siempre un valor de 10 años para t, lo cual no permite

obtener predicciones a corto o medio plazo, tal como se ha comentado

anteriormente en el punto de Antecedentes y estado actual del tema. Sin

embargo, la expresión anterior permite sustituir en cualquier valor de t

contenido en el rango de seguimiento del estudio. Esto nos permite obtener

predicciones a corto y medio plazo, así como a largo plazo (10 años). Por

ejemplo, podríamos sustituir en 𝑡 = 2, 4, 6, 8, 10 años y obtendríamos la

probabilidad cada dos años de evento cardiovascular, dependiendo

exclusivamente de la puntuación de cada nuevo sujeto y del tiempo de

predicción, ya que el resto de parámetros serían constantes en el cálculo.

Ahora, una vez construido nuestro sistema de puntos con predicciones en

varios puntos de corte en el tiempo, se nos presenta una dificultad, que es

cómo validar el sistema de puntos para determinar si existen diferencias entre

los eventos cardiovasculares observados y aquéllos pronosticados por nuestro

sistema.

La clave del cálculo de los eventos observados reside en que debemos de

tener en cuenta que un sujeto que ha padecido un evento cardiovascular en el


26

seguimiento, lo habrá padecido a partir de un instante fijo 𝑡∗, por lo que si

estamos calculando los eventos observados antes de dicho instante de tiempo,

el sujeto deberá ser considerado como libre de evento cardiovascular, mientras

que si el punto de corte del tiempo es considerado superior a 𝑡∗,

consideraremos que el participante sí que ha experimentado un evento.

Tras cálculo de eventos observados y esperados en cada instante de tiempo

prefijado (corto, medio o largo plazo), sólo debemos de determinar si existen

diferencias mediante la prueba de la χ2 de Pearson, cuyo valor será diferente

según el punto de corte del tiempo escogido (en el ejemplo anterior: 2, 4, 6, 8 y

10 años).

1.5.3. Determinación de la mejor combinación de variables para predecir un evento cardiovascular.

Supongamos que tenemos un total de 𝑚 variables explicativas para tratar de

predecir un evento cardiovascular mediante un modelo de supervivencia, por

ejemplo un modelo de regresión de Cox. Por otro lado, consideremos que

podemos introducir un máximo de 𝑟 variables en el modelo, ya que debemos de

tener mucho cuidado en la convergencia del mismo y en la razón entre el

número de eventos y el número de variables explicativas introducidas en dicho

modelo.

La mayoría de trabajos han propuesto utilizar un método por pasos basado en

un test estadístico, como puede ser el likelihood ratio test o el score test. Este

modelo por pasos suele ser hacia adelante (partir del modelo nulo e ir

introduciendo variables según el test estadístico elegido) o hacia atrás [partir

del modelo saturado (con todas las variables explicativas) e ir extrayendo

variables según el test estadístico elegido]. También se han considerado

métodos por pasos sucesivos, los cuales combinan los dos métodos anteriores

(hacia adelante y hacia atrás). Sin embargo, a la hora de realizar una

predicción, nos interesa aquel conjunto de variables que puede explicar mejor

nuestra variable respuesta, es decir, la aparición de la enfermedad

cardiovascular en el tiempo.


27

En los modelos de supervivencia se ha propuesto un estadístico, conocido

como estadístico 𝐶 (Pencina & D'Agostino, 2004), el cual mide la capacidad

discriminante del modelo para la variable respuesta. Este estadístico se

encuentra entre 0 y 1. La interpretación de su valor es la siguiente: cuanto más

cercano a 1 esté, mayor discriminación tendrá el modelo. En otras palabras,

tiene gran interés encontrar la combinación de variables que arroje un mayor

estadístico 𝐶, ya que será aquélla que discrimine mejor entre los sujetos que

padecerán una enfermedad cardiovascular y aquéllos que no, es decir, las

predicciones serán mucho más precisas.

Los métodos anteriores se basan en otro tipo de test estadísticos, por lo que no

tienen en cuenta esta cuestión. Por ese motivo, vamos a detallar una

metodología capaz de escoger la mejor combinación de variables para predecir

la aparición de una enfermedad cardiovascular (aquélla con un estadístico 𝐶

máximo).

Como hemos indicado anteriormente, podemos seleccionar un máximo de 𝑟

variables de un total de 𝑚. Planteamos el siguiente algoritmo:

𝑖 = 𝑟:

a. Calcular todas las posibles combinaciones de 𝑚 variables

tomadas de 𝑟 en 𝑟 sin repetición, es decir, �𝑚𝑖 � = �𝑚𝑟 �.

b. Determinar el estadístico 𝐶 para cada una de estas

combinaciones de variables.

𝑖 = 𝑟 − 1:

a. Repetir el proceso, pero ahora tomando las combinaciones de

𝑟 − 1 en 𝑟 − 1 sin repetición: �𝑚𝑖 � = � 𝑚𝑟−1�.

b. Calcular de nuevo el estadístico 𝐶 en cada combinación.


28

…

𝑖 = 1:

a. Finalmente, calculamos todas las combinaciones de 𝑚 variables

tomadas de 1 en 1 sin repetición, es decir, �𝑚𝑖 � = �𝑚1� = 𝑚. En

otras palabras, probamos 𝑚 modelos, los cuales contienen una

única variable, la cual es diferente en todos ellos.

b. Calcular el estadístico 𝐶 en los 𝑚 modelos.

Una vez hemos realizado todo este proceso, seleccionaremos aquella

combinación que tenga un estadístico 𝐶 máximo, ya que será el modelo con

mayor capacidad discriminante y aquél que tendrá predicciones de enfermedad

cardiovascular más precisas.

El número total de combinaciones probadas en este nuevo algoritmo es:

�𝑚𝑟� + �

𝑚𝑟 − 1

� + �𝑚

𝑟 − 2� + ⋯+ �

𝑚2� + �

𝑚1�.

Por ejemplo, si 𝑚 = 20 y 𝑟 = 8, el algoritmo corroborará un total de 263,949

combinaciones y se quedará con aquélla que tenga un estadístico 𝐶 más

elevado (máximo de los valores).


29

1.5.4. Construcción, validación y utilización de un sistema de puntos con medidas repetidas de los factores de riesgo cardiovascular.

Construcción

Queremos determinar la probabilidad de padecer una enfermedad

cardiovascular desde la situación basal (𝑡 = 0) hasta un instante de tiempo fijo

��, teniendo una serie de factores de riesgo medidos durante el seguimiento y

en la situación basal. Para lograr este objetivo realizaremos los siguientes

pasos:

1) Ajustar un modelo de regresión de Cox con variables tiempo

dependientes en la muestra recogida. Al no poder estimar un modelo

conjunto con múltiples parámetros longitudinales (Rizopoulos, 2012),

emplearemos el modelo de Cox extendido clásico (sin estructura

compartida), el cual necesita conocer los valores de todos los

parámetros longitudinales en todos los instantes de tiempo. Como esto

no es conocido porque los parámetros se recogen de forma intermitente,

tomaremos el último valor en el tiempo como referencia.

2) Emplear la metodología del estudio Framingham para adaptar los

coeficientes del modelo obtenido a un sistema de puntos y determinar

las probabilidades de enfermedad cardiovascular de cada puntuación

hasta el instante �� (Sullivan, Massaro & D'Agostino, 2004). A través de

estas probabilidades construir grupos de riesgo que sean sencillos de

entender para el clínico (por ejemplo, en múltiplos de 5%).

3) Ajustar un joint model for longitudinal and time-to-event data para cada

parámetro longitudinal recogido en el seguimiento. Además, éstos

incluirán todas las variables basales. Estos modelos son construidos con

el objetivo de realizar predicciones de los parámetros longitudinales

sobre nuevos pacientes (validación o utilización tras validación).


30

Validación estadística a través de simulación

Una vez construido el sistema de puntos, queremos ver si sobre otro conjunto

diferente de sujetos (muestra de validación), el modelo determina de forma

precisa la aparición de la enfermedad cardiovascular. De esta muestra de

validación conoceremos en la situación basal la historia de los marcadores

longitudinales hasta el instante 𝑡 = 0 [registro de los factores de riesgo

cardiovascular en la historia clínica (𝑡 < 0)] y el valor de las variables basales.

Con esta información determinaremos la probabilidad que tiene cada sujeto de

experimentar un evento y la compararemos con lo que realmente ha pasado,

es decir, determinar si es válido el modelo. Para determinar esta validez

seguiremos los siguientes pasos:

1) Determinar 𝐿 simulaciones de los parámetros longitudinales en el

instante de tiempo �� mediante los modelos elaborados en el paso 3) de

la construcción, a través de su historia (𝑡 < 0) y de las variables basales

(𝑡 = 0). Mediante estos valores simulados, construiremos una

distribución de la puntuación para cada sujeto de la muestra, es decir,

cada sujeto tendrá 𝐿 valores de la variable puntuación (basta con

evaluar el sistema de puntos utilizando los valores simulados y las

variables basales). Por otra parte, también para cada simulación 𝑙-ésima

cada paciente tendrá un valor de puntuación. En otras palabras, cada

simulación tendrá una distribución de la variable de puntuaciones.

2) Para cada simulación 𝑙-ésima ajustar un modelo de Cox clásico (sin

variables tiempo-dependientes), utilizando como única variable

explicativa la puntuación obtenida. De cada uno de estos 𝐿 modelos,

determinar el valor del estadístico 𝐶 (estadístico de Harrell) (Pencina &

D'Agostino, 2004). Esto nos dará una distribución de valores para este

estadístico, de los cuales calcularemos la media (o la mediana), así

como los percentiles 2.5 y 97.5% (Rizopoulos, 2011). De esta forma

construiremos un intervalo de confianza para este estadístico, el cual


31

nos indicará la capacidad discriminante del sistema de puntos para

determinar qué pacientes desarrollaran una enfermedad cardiovascular.

3) Para cada paciente de la muestra de validación, calcular la mediana de

su distribución de puntuaciones. Notar que no empleamos la media,

debido a que podría tener decimales y esto no tiene sentido a la hora de

aplicar el sistema de puntos. Utilizando estas medianas, clasificar a cada

paciente en un grupo de riesgo y comprobar si existen diferencias entre

la tasa de acontecimientos predichos por el sistema de puntos en cada

grupo con la observada. El test empleado para este proceso será el de

la χ2 de Pearson.

En la literatura se ha descrito que el estadístico 𝐶 tiene varias limitaciones

(Lloyd-Jones, 2010). Primeramente, no compara si los riesgos observados y

estimados son similares en los sujetos. Debido a ello hemos añadido el análisis

de las diferencias entre los eventos observados y esperados, lo cual minimiza

el problema propuesto. Finalmente, en segundo lugar, es muy sensible a

valores de hazard ratios grandes (≥9). Sin embargo, debemos de tener en

cuenta que al tratarse todas las variables de forma cuantitativa (no

categorizando), los valores de los hazard ratio no superan este umbral. En

definitiva, mediante el análisis conjunto del índice de concordancia de Harrell y

las diferencias de eventos observados con los eventos esperados, disponemos

de una forma satisfactoria para validar estadísticamente por simulación el

modelo propuesto.

Utilización potencial

Una vez validado estadísticamente a través de simulación el sistema de

puntos, el clínico está en condiciones de aplicar dicho sistema para determinar

en un nuevo paciente su riesgo cardiovascular y posibles intervenciones para

su disminución. Hemos de tener en cuenta que el profesional sanitario tendrá

información histórica de los parámetros longitudinales (𝑡 < 0) e información en


32

la situación basal del nuevo paciente (𝑡 = 0). Los pasos que debe de seguir el

clínico son los siguientes:

1) Determinar el valor de cada uno de los parámetros longitudinales en el

instante ��. Para ello aplicaremos los modelos obtenidos en el paso 3) de

la construcción sobre el histórico del nuevo paciente y sobre la

información basal, con el objetivo de determinar 𝐿 simulaciones de cada

parámetro longitudinal, de forma análoga al proceso de validación. En

cada simulación 𝑙-ésima determinar la puntuación correspondiente al

perfil de factores de riesgo cardiovascular obtenido (valores simulados e

información basal). Esto nos construirá una distribución de la puntuación

para el nuevo paciente.

2) Determinar la mediana y los percentiles 2.5 y 97.5% del vector de

puntuaciones construido en el paso anterior. La mediana será la

estimación de la puntuación del nuevo paciente y los percentiles

definirán su intervalo de confianza (Rizopoulos, 2011). Por otro lado,

como cada puntuación tiene un riesgo asociado, el profesional sanitario

conocerá la probabilidad de enfermedad cardiovascular en un tiempo ��,

junto con su intervalo de confianza. Finalmente, le indicaremos al clínico

los valores de los parámetros biológicos en �� de la mediana del sistema

de puntos, ya que de esta forma podrá ver cuáles de ellos tienen una

puntuación por encima del nivel normal, es decir, conocerá los posibles

puntos de intervención para disminuir el riesgo cardiovascular.

3) Ahora que el clínico conoce el riesgo cardiovascular y qué parámetros

otorgan una puntuación por encima del nivel normal, tiene que diseñar la

mejor intervención de cara al paciente. Esto presenta un problema, ya

que necesitamos conocer el valor de cada parámetro biológico en el

instante ��, es decir, el clínico conoce una aproximación basada en

simulaciones, que ha sido construida a través de la historia de su

paciente, pero no sabe cómo afectarán sus intervenciones al riesgo

cardiovascular.


33

Del paso anterior el profesional sanitario conoce los parámetros sobre los que

ha de intervenir y la historia de estos parámetros y su situación basal. En base

a estas medidas el médico establecerá un objetivo realista para la próxima

visita del paciente en un instante �� (0 < �� < ��). Ahora insertará los valores

objetivo en �� en la historia del parámetro biológico y determinará su valor en el

instante ��, es decir, determinará 𝐿 simulaciones de cada uno de los factores de

riesgo cardiovascular a través de los modelos anteriores (paso 3 de la

construcción), añadiendo un nuevo valor a la historia (��).

A través de estos cálculos obtendrá el beneficio de su intervención (estimación

[media o mediana] del parámetro biológico en ��) y podrá visualizar en el

sistema de puntos cómo disminuiría el riesgo del paciente.

1.6. Resultados.

Vamos a detallar los resultados que derivan de la aplicación de las mejoras

propuestas a los modelos cardiovasculares ya existentes, clasificando dichos

resultados según objetivos específicos alcanzados:

1.6.1. Predicciones a corto, medio y largo plazo.

En las Figuras 1 y 2 podemos observar cómo varían las probabilidades de

enfermedad cardiovascular, conforme cambiamos el tiempo de predicción.

Mientras que en la Figura 1 cambia cada año hasta un máximo de 4 (Ramírez-

Prado et al., 2015), en la Figura 2 se calcula la probabilidad de evento cada 2

años hasta un máximo de 14 (Artigao-Ródenas et al., 2015).


34

Figura 1: Escala de riesgo cardiovascular para pacientes diabéticos tipo 2 hospitalizados (Ramírez-Prado et al., 2015).


35

Figura 2: Modelo predictivo para determinar qué paciente sufrirá una enfermedad cardiovascular en un periodo máximo de 14 años.

LVH, hipertrofia ventricular izquierda; PA, actividad física; SBP, tensión arterial sistólica.

Definición de la actividad física (Food and Agriculture Organization of the United Nations &

World Health Organization, 1973): 1) Actividad ligera: actividad de quien trabaja sentado en un

despacho o detrás de un mostrador con instrumentos automatizados; 2) Actividad moderada:

ligera actividad física continua, como en un trabajo ligero en la industria o en la agricultura fuera

de estación; 3) Actividad intensa: trabajo pesado y, a veces, enérgico (producción agrícola,

trabajo en minas o fundiciones). Si una persona no estaba trabajando se consideró que

experimentaba una actividad ligera.


36

Por otra parte, a la hora de validar el sistema de puntos con periodo máximo de

predicción de 14 años (Figura 3), vemos cómo aumenta la tasa de eventos

conforme aumenta el tiempo (cada 2 años).


37

Figura 3: Diferencias entre los diferentes grupos de riesgo construidos en la muestra de validación (Artigao-Ródenas et al., 2015).


38

1.6.2. Determinación de la mejor combinación de variables para predecir un evento cardiovascular.

En el artículo incluido en esta tesis doctoral sobre la construcción de una

escala de riesgo cardiovascular en pacientes diabéticos tipo 2 ingresados a

través de un servicios de urgencias (Ramírez-Prado et al., 2015), se aplicó el

algoritmo propuesto. Se disponía de un total de 15 posibles variables

explicativas, de las cuales sólo podíamos seleccionar un máximo de 7, por lo

que se comprobó un total de 16,383 combinaciones y la mejor combinación

otorgó un estadístico 𝐶 al modelo predictivo de 0.734.

1.6.3. Construcción, validación y utilización de un sistema de puntos con medidas repetidas de los factores de riesgo cardiovascular.

Dado que no se disponía de datos reales para aplicar el nuevo método

propuesto, se trabajo sobre datos simulados con el único objetivo de explicar al

lector cómo utilizar el método propuesto. Notar, que se han llevado a cabo dos

simulaciones dando lugar a dos conjuntos de datos simulados, uno se ha

utilizado para construir el modelo y otro para su validación. Para que ambos

conjuntos tengan plausibilidad biológica, hemos empleado estimaciones

obtenidas en el estudio cardiovascular Puras-GEVA (Artigao-Ródenas et al.,

2015).

Nuestros conjuntos incluyen los siguientes parámetros biológicos: edad (años),

tensión arterial sistólica (SBP) (mmHg), hemoglobina glicada (HbA1c) (%),

índice aterogénico, sexo (hombre o mujer) y tabaquismo (sí o no), de los cuales

la SBP, HbA1c y el índice aterogénico estarán presentes en la situación basal

(𝑡 = 0) y en el seguimiento para la muestra de construcción (𝑡 > 0) ó recogidos

en la historia clínica para la muestra de validación estadística a través de

simulación (𝑡 < 0). La elección de incluir estas variables se ha basado en las

escalas de riesgo cardiovascular actuales (Conroy et al., 2003; National Heart,

Lung, and Blood Institute, 2015), salvo la HbA1c, que se ha utilizado en lugar

del diagnóstico de diabetes mellitus, con el objetivo de incluir otro parámetro

tiempo-dependiente en el modelo final, además de que de esta forma


39

podríamos valorar el control de la diabetes mellitus (HbA1c <6.5%) a la hora de

prevenir una enfermedad cardiovascular.

Respecto a la variable principal (tiempo hasta la enfermedad cardiovascular),

vamos a suponer que nuestra cohorte se utiliza para predecir la aparición de

una enfermedad cardiovascular con un seguimiento de 2 años. Notar que las

escalas tradicionales de riesgo cardiovascular utilizan un tiempo de 10 años

(Conroy et al., 2003; National Heart, Lung, and Blood Institute, 2015). Nosotros

hemos tomado este valor inferior a dichas escalas, porque vamos a realizar

predicciones de los parámetros longitudinales desde la situación inicial (𝑡 = 0)

hasta el tiempo de predicción y si tomáramos un valor de predicción de 10

años, las predicciones de los parámetros longitudinales tendrían una gran

variabilidad y no permitirían realizar predicciones precisas de qué paciente

desarrollará una enfermedad cardiovascular, lo que en consecuencia

equivaldría a la falta de utilidad del método propuesto. Sin embargo, el hecho

de que las predicciones de los parámetros longitudinales tengan carácter

dinámico, permite determinar el riesgo a 2 años cada vez que el paciente entra

en la consulta del profesional sanitario con una mayor precisión. No obstante,

hemos de tener en cuenta que el método propuesto se ha desarrollado para un

tiempo teórico �� y puede ser aplicado para cualquier valor, aunque los

parámetros longitudinales tendrían por norma general mayor variabilidad en el

tiempo, pero claramente esto dependerá de la naturaleza de los datos, tanto

del individuo como de la población (Rizopoulos, 2012).

El trabajo en el que se ha basado nuestro conjunto de datos simulado (Artigao-

Ródenas et al., 2015), desarrolló y validó un modelo predictivo de enfermedad

cardiovascular (angina de cualquier tipo, infarto de miocardio, accidente

cerebrovascular, enfermedad arterial periférica de los miembros inferiores, o

muerte cardiovascular), para determinar el cálculo del riesgo a corto, medio y

largo plazo ( el riesgo fue calculado para cada puntuación cada dos años hasta

un máximo de catorce) en población general. En la Tabla 4 de este sistema de

puntos se observa la importancia de esta cuestión. Por ejemplo, un paciente

con 9 puntos presenta una probabilidad de enfermedad cardiovascular a los 2

años de 0.67%, mientras que a los 10 años ésta asciende al 5.16% (Artigao-

Ródenas et al., 2015). Si calculamos de forma periódica el riesgo


40

cardiovascular de nuestro paciente a los dos años y éste mantiene su

puntuación, no tomaremos ninguna medida terapéutica (riesgo <1%), mientras

que si sólo calculamos el riesgo una única vez cada 10 años, tomaremos

medidas terapéuticas agresivas en el momento en el que el paciente ha

entrado en nuestra consulta médica, ya que el riesgo se corresponde al punto

de corte definido como alto por el proyecto SCORE (5%→ uno de cada veinte

pacientes) (Conroy et al., 2003). En otras palabras, una predicción a corto

plazo realizada de forma regular, podría significar un cambio en la toma de

decisiones terapéuticas para la prevención de enfermedades cardiovasculares,

siempre y cuando tengamos la posibilidad de calcular el riesgo de forma

periódica. Por otra parte, dado que la tabla de riesgo del estudio Puras-GEVA

incluye predicciones a 4, 6, 8, 10, 12 y 14 años, se ha seleccionado el punto de

corte más bajo, debido a que si realizábamos predicciones con mayor tiempo,

éstas podrían aumentar su dispersión (Rizopoulos, 2012). Por estos motivos se

ha seleccionado este punto de corte para la simulación (2 años).

En lo referente a las medidas longitudinales en el seguimiento (muestra de

construcción), se ha supuesto que el paciente acude una vez cada 3 meses a

la consulta del médico para poder medir SBP, HbA1c e índice aterogénico.

Esto se realiza hasta el final del seguimiento de cada uno de los pacientes. Por

otro lado, en la muestra de validación estadística a través de simulación,

supondremos que tenemos cierta probabilidad de tener registradas en la

historia clínica medidas de todos los parámetros longitudinales cada 3 meses

durante 5 años de forma retrospectiva (𝑡 < 0). La probabilidad es diferente para

cada una de las visitas y dependerá de cada paciente. En otras palabras,

tendremos medidas intermitentes de todos estos parámetros desde 𝑡 = −5

años hasta 𝑡 = 0.

En el material suplementario del tercer artículo presentado en esta tesis

doctoral (Palazón-Bru et al., 2015) se han detallado todas las fórmulas

matemáticas empleadas para construir nuestros conjuntos de datos, siempre

basándonos en el estudio Puras-GEVA (Artigao-Ródenas et al., 2015).

Podría pensarse que al manejar un menor periodo de tiempo de 2 años no

exista variabilidad en los factores de riesgo cardiovascular. Sin embargo, en el


41

material suplementario observamos que los modelos utilizados presentan

variabilidad temporal de los factores de riesgo (Palazón-Bru et al., 2015), ya

que de no presentarse, éstos contendrían la constante con un error aleatorio

muy pequeño. En otras palabras, tiene sentido utilizar este periodo de

predicción.

Se decidió simular un conjunto de datos, por no disponer en la actualidad de un

conjunto de datos reales. Esta forma de explicar un nuevo método, ha sido

desarrollada por otros autores que han trabajado con modelos conjuntos de

supervivencia y datos longitudinales, ya que el único objetivo del conjunto de

datos simulado, es lograr explicar al lector cómo aplicar el nuevo método

(Faucett & Thomas, 1996; Henderson, Diggle & Dobson, 2000; Wang & Taylor,

2001; Brown, Ibrahim & DeGruttola, 2005; Zeng & Cai, 2005; Vonesh, Greene

& Schluchter, 2006; Rizopoulos & Ghosh, 2011).

Construcción del sistema de puntos

1) Estimamos el modelo multivariante de Cox con variables tiempo-

dependientes, definidas como el último valor registrado. Sus

coeficientes quedan reflejados en la Tabla 1.

Tabla 1: Parámetros (βs) del modelo de Cox multivariante.

Variable β p-valor

Edad (basal) (por 1 año) 0.0846 <0.001

SBP (por 1 mmHg) 0.00874 <0.001

HbA1c (por 1%) 0.188 <0.001

Índice aterogénico (por 1 unidad) 0.191 <0.001

Sexo hombre 0.479 0.001

Fumador (basal) 0.721 <0.001

Abreviaturas: SBP, systolic blood pressure; HbA1c, glycated haemoglobin.

Goodness-of-fit (likelihood ratio test): χ2=912.3, p<0.001.


42

En lo referente al análisis de los residuos de Martingala (Figura 4), observamos

que todas las variables tienen forma lineal en el modelo, ya que la línea roja

muestra una linealidad perfecta (la línea roja indica el ajuste lineal entre las dos

variables presentes en el gráfico cartesiano). Esto era esperado, pues los datos

han sido simulados de esa forma. No obstante debemos de corroborarlo antes

de continuar con el método propuesto.


43

Figura 4: Forma funcional de las covariables en el modelo.


44

A continuación debemos de corroborar que el modelo ajusta bien los datos,

mediante el análisis de los residuos de Cox-Snell (Figura 5). Como podemos

observar, la línea roja (distribución exponencial de parámetro 1) se mantiene

dentro de los intervalos de confianza, por lo que nuestro modelo cumple todas

sus hipótesis básicas y estamos en condiciones de comenzar el paso 2) de la

construcción.

Figura 5: Análisis de los residues de Cox-Snell.

2) Adaptamos los coeficientes obtenidos en el modelo multivariante

a un sistema de puntos según la metodología del Framingham

Heart Study. Dado que es una metodología ampliamente

utilizada, simplemente daremos el resultado en forma de figura

(Figura 6). No obstante, todos los cálculos se anexan en una hoja

de cálculo Excel en el material suplementario donde se presenta

la técnica (Palazón-Bru et al., 2015). Se ha decidido la utilización

de una figura en vez de una tabla, para facilitar la lectura al

usuario, ya que las celdas están más separadas y no otorgan

confusión al clínico. Esta idea ha sido utilizado tanto por el

proyecto SCORE como por el estudio Framingham. Notar que los

grupos de riesgo se han construido como en el proyecto SCORE.


45

Figura 6: Sistema de puntos para predecir enfermedad cardiovascular en 2 años.

Abreviaturas: SBP, tensión arterial sistólica; HbA1c, hemoglobina glicada; TC, colesterol total;

HDL-c, colesterol HDL.


46

3) Hemos ajustado los modelos conjuntos para cada uno de los

parámetros longitudinales y sus coeficientes estimados quedan

reflejados en la Tabla 2. No se ha especificado una función de

riesgo basal.

Tabla 2: Parámetros de los modelos conjuntos construidos.

Variable SBP

(mmHg)

p-

valor

HbA1c

(%)

p-

valor

Índice

aterogénico

p-

valor

Proceso del evento

Sexo hombre 0.428 <0.001 0.475 <0.001 0.446 <0.001

Edad (por 1 año) 0.0837 <0.001 0.0840 <0.001 0.0833 <0.001

Fumador 0.731 <0.001 0.757 <0.001 0.775 <0.001

Parámetro (por 1 unidad) 0.0085 <0.001 0.216 <0.001 0.195 <0.001

Proceso longitudinal: efectos fijos

1 133.557 <0.001 6.158 <0.001 4.602 <0.001

T 0.0046 <0.001 0.0001 <0.001 0.0001 <0.001

Proceso longitudinal: efectos mixtos

1 21.683 N/A 1.346 N/A 1.324 N/A

T 0.0358 N/A * * 0.0013 N/A

Residuos 8.933 N/A 0.357 N/A 0.302 N/A

Abreviaturas: SBP, tensión arterial sistólica; HbA1c, hemoglobina glicada; N/A, no aplica. *:

término eliminado por problemas de convergencia. La estrategia fue eliminar de términos más

complejos a términos más sencillos.

Goodness-of-fit: 1) SBP: χ2=371,574.1, p<0.001; 2) HbA1c: χ2=210,881.1, p<0.001; 3) Índice

aterogénico: χ2=121,118.0, p<0.001.


47

Respecto a las hipótesis básicas de los modelos, primeramente podemos

observar en la Figura 7, que no existe una tendencia entre los valores

ajustados y los residuos específicos de cada individuo, ya que la línea roja

(ajuste lineal entre las dos variables del eje cartesiano) se mantiene muy

cercana a 𝑦 = 0.

Figura 7: Gráficos de residuos específicos de cada sujeto frente a los valores ajustados por los modelo.

Arriba: tensión arterial sistólica. Abajo a la izquierda: hemoglobina glicada. Abajo a la derecha:

índice aterogénico.


48

En lo referente a la normalidad de los residuos específicos, dado el volumen de

datos que disponemos, podemos asumir normalidad asintótica. No obstante, en

la Figura 8 se encuentran disponibles los gráficos Q-Q que comparan la

distribución de los datos con una distribución normal. A raíz de estos gráficos

también podemos concluir que nuestros residuos siguen una distribución

normal.

Figura 8: Gráficos Q-Q de los residuos específicos de cada sujeto.

Arriba: tensión arterial sistólica. Abajo a la izquierda: hemoglobina glicada. Abajo a la derecha:



49

En el análisis de los residuos marginales frente a los valores ajustados (Figura

9), vemos que la línea roja (ajuste lineal entre las variables presentadas en el

gráfico cartesiano) no establece un patrón fuera de 𝑦 = 0. En otras palabras, se

han verificado todos los supuestos del submodelo longitudinal, por lo que ahora

vamos a comprobar las hipótesis básicas del submodelo de supervivencia.

Figura 9: Gráficos de los residuos marginales frente a los valores ajustados.

Arriba: hemoglobina glicada. Abajo a la izquierda: tensión arterial sistólica. Abajo a la derecha:



50

En la Figura 10 quedan reflejados ambos tipos de gráficos para corroborar las

hipótesis básicas del submodelo de supervivencia. En la primera columna

(valores ajustados de los parámetros longitudinales contra los residuos de

Martingala), no se observa ninguna tendencia, ya que la línea roja (ajuste lineal

entre ambas variables del gráfico) se asemeja a la recta 𝑦 = 0. Por otra parte,

podemos asumir que los residuos de Cox-Snell siguen una distribución

exponencial de parámetro la unidad (la línea roja se sitúa en los intervalos de

confianza de la supervivencia).


51

Figura 10: Gráficos para corroborar las hipótesis básicas del submodelo de supervivencia.

Residuos de Martingala Residuos de Cox-Snell

Arriba: tensión arterial sistólica. Centro: hemoglobina glicada. Abajo: índice aterogénico.


52

Validación estadística por simulación

La concordancia fue muy satisfactoria: 0.844 (IC 95%: 0.842-0.846). La

comparación entre eventos observados y esperados no mostró diferencias

estadísticamente significativas (Figura 11).

Figura 11: Comparación entre proporción (%) de eventos observados y esperados en los diferentes grupos de riesgo.

0.00

5.00

10.00

15.00

20.00

25.00

Low Medium High Very high Observed 0.46 3.17 10.78 24.63 Expected 0.39 2.25 7.39 22.83

p=0.555


53

Utilización potencial

Llega a nuestra consulta un nuevo paciente de las siguientes características:

varón, 83 años, no fumador y tratamiento farmacológico (un antihipertensivo y

un antidiabético oral) y no farmacológico (dieta y ejercicio). Por otra parte, se

dispone del histórico de sus factores de riesgo cardiovascular (Tabla 3).

Tabla 3: Parámetros recogidos en la historia clínica electrónica necesarios para aplicar nuestro sistema de puntos.

Tiempo (días) SBP (mmHg) HbA1c (%) Índice aterogénico

-360 152 5.1 3.56

-330 135 5.3 3.23

-270 164 4.7 3.45

-180 153 4.4 4.12

-90 170 5.0 4.15

0 145 4.9 5.17

Abreviaturas: SBP, tensión arterial sistólica; HbA1c, hemoglobina glicada. El tiempo tiene valor

negativo porque se refiere a los valores recogidos antes de la situación basal (𝑡 = 0).

Tras aplicar el modelo construido, se obtiene un histograma de la puntuación

de riesgo cardiovascular obtenida por el paciente (Figura 12). En ella

observamos un riesgo cardiovascular elevado, ya que la mayoría de

simulaciones se concentra en 16 puntos. La estimación de la puntuación fue de

16 (95% CI: 15-17). La mediana de la puntuación se correspondió con una SBP

de 160 mmHg, una HbA1c de 4.9% y un índice aterogénico de 6.76. Teniendo

en cuenta que existen factores del modelo sobre los cuales no podemos

intervenir (sexo y edad), que dan al paciente un mínimo de 13 puntos, debemos

de pensar estrategias que logren que el paciente no puntúe en el resto de

categorías de la escala (Figura 6).


54

Figura 12: Riesgo cardiovascular de un nuevo paciente teórico (situación pre-intervención).

Ahora, el facultativo estima que si el paciente cumple con una serie de

intervenciones [farmacológicas (añadir dos fármacos antihipertensivos → -20

mmHg; prescribir una estatina → -40% índice atrogénico) y no farmacológicas

(reducir la sal en las comidas → -5 mmHg)], sus parámetros longitudinales en

un periodo de 3 meses serán: SBP 120 mmHg (145 – 2·10 – 5 = 120 mmHg),

índice aterogénico 3.10 (5.17 – 40% = 3.10), y HbA1c 4.9% (mismo valor, pues

no se ha realizado ninguna intervención). A continuación aplicamos de nuevo el

modelo con este nuevo dato y obtenemos el riesgo cardiovascular a dos años

(Figura 13). La estimación de la puntuación es de 15 (IC 95%: 14-15) y los

valores que otorgan la puntuación mediana son: SBP 124 mmHg, índice

aterogénico 4.85, y HbA1c 5.0%. De forma que el riesgo ha disminuido, ya que

ahora generalmente el paciente tiene 14 puntos (Figura 6).


55

Figura 13: Riesgo cardiovascular de un nuevo paciente teórico (situación post-intervención).


56

1.7. Discusión.

Sumario

Esta tesis doctoral aporta posibles mejoras en la construcción de modelos

predictivos de enfermedad cardiovascular ya existentes. Estas mejoras residen

fundamentalmente en: 1) Obtener predicciones a corto y medio plazo; 2)

Seleccionar la mejor combinación de variables para discriminar con mayor

precisión, qué paciente desarrollará una enfermedad cardiovascular; 3) Detallar

una metodología para construir modelos predictivos de enfermedad

cardiovascular, teniendo en cuenta la variabilidad de los factores de riesgo

cardiovascular y sin perder la simplicidad de los sistemas de puntos, que son

ampliamente utilizados en la práctica clínica habitual a nivel mundial (Conroy et

al., 2003; National Heart, Lung, and Blood Institute, 2015).

Comparación con la literatura existente

Si analizamos la primera mejoría propuesta, estamos dando a los clínicos la

posibilidad de calcular el riesgo cardiovascular en periodos inferiores a 10

años, lo que puede permitir mejorar la toma de decisiones, ya que si un

paciente presenta alto riesgo cardiovascular en un periodo de tiempo muy

corto, deberemos realizar una estrategia terapéutica mucho más agresiva. Los

modelos de Framingham y SCORE no tienen esta característica (Conroy et al.,

2003; Cooney, Dudina & Graham, 2009; National Heart, Lung, and Blood

Institute, 2015). No obstante, dado que tendrán realizado el cálculo de la

supervivencia en todos los instantes de tiempo, es algo que podría

implementarse y tratar de aplicarlo en futuras escalas de riesgo cardiovascular.

En segundo lugar, el hecho de calcular todas las posibles combinaciones de

variables explicativas, nos está ayudando a elegir el mejor modelo matemático

para la obtención de la predicción más precisa en la enfermedad

cardiovascular. Eso permite aumentar el valor de nuestro estadístico 𝐶 hasta el

máximo posible. Esta estrategia en la elección de las variables no ha sido

utilizada por los investigadores del estudio Framingham o los del proyecto


57

SCORE (Conroy et al., 2003; Cooney, Dudina & Graham, 2009; National Heart,

Lung, and Blood Institute, 2015). Puede ser que si aplicamos esta estrategia

sobre los datos del proyecto SCORE o del estudio Framingham, obtengamos

una diferencia pequeña con el estadístico 𝐶 anterior. No obstante, una

ganancia de un 5%, puede equivaler a lograr identificar una mayor proporción

de pacientes que sufrirán una enfermedad cardiovascular, lo que desembocaría

en una menor morbimortalidad cardiovascular.

Finalmente, en lo referente a la tercera mejora propuesta, las escalas de riesgo

cardiovascular existentes no valoran la variabilidad temporal de los parámetros

de control de los factores de riesgo, aunque como aspecto muy positivo, sí que

tienen en cuenta la sencillez para su aplicación inmediata por parte de los

profesionales sanitarios, que son los que realmente aplicarán estos modelos

matemáticos (Conroy et al., 2003; Cooney, Dudina & Graham, 2009; National

Heart, Lung, and Blood Institute, 2015). Por otra parte, los joint models

actualmente utilizados, sí que tienen en cuenta la variabilidad en el tiempo de

un único parámetro longitudinal (Rizopoulos, 2011), pero su interpretación no

es tan sencilla como la de un sistema de puntos y no permite emplear varios

parámetros longitudinales, cuestión clave por la etiología multifactorial de la

enfermedad cardiovascular. Nosotros hemos tratado de fundir todas estas

técnicas en un único algoritmo, manteniendo las virtudes de cada una de ellas

(modelo de riesgos relativos, sistemas de puntos, predicciones dinámicas...).

Si tratamos de comparar el método propuesto con las actuales escalas de

riesgo cardiovascular, se nos presenta un problema, ya que nuestro modelo es

más adecuado para hacer predicciones en un tiempo más corto, ya que

conforme nos separamos de la situación basal (𝑡 = 0) a la hora de predecir,

esto ocasiona un aumento de la variabilidad de las predicciones de los

parámetros longitudinales (Rizopoulos, 2012). Este comportamiento se observa

también en economía (la bolsa) y en meteorología (pronóstico del tiempo).

Aunque esto dependerá de la naturaleza de los datos que estemos analizando,

tanto a nivel individual como poblacional. Todo esto no ocasiona una debilidad

en nuestro modelo, ya que al tener las predicciones de los parámetros

longitudinales un carácter dinámico (Rizopoulos, 2011), cada vez que

actualicemos la información clínica de nuestro paciente, este riesgo se


58

calculará de forma inmediata. Esto lo hemos visto en el ejemplo propuesto en

resultados, en donde al introducir nuevos valores de los parámetros

longitudinales, éstos se actualizan y determinan la nueva distribución de puntos

del paciente. En otras palabras, el método propuesto podría utilizarse para

calcular el riesgo del paciente cada vez que éste acuda a la consulta de su

profesional sanitario, mientras que las escalas de riesgo tradicionales pueden

utilizarse con mayor intervalo de tiempo, pues el pronóstico es para 10 años.

En definitiva, podríamos utilizar de forma conjunta ambos tipos de modelos

para valorar el riesgo tanto a corto como a largo plazo, ya que aunque se han

visto discrepancias entre predicciones cardiovasculares a corto y largo plazo

(Quispe et al., 2015), la utilización periódica de las predicciones a corto plazo,

teniendo en cuenta la variabilidad de los factores de riesgo, puede

complementar a los modelos cardiovasculares a largo plazo. En otras palabras,

nuestra intención es que en la práctica clínica se utilice el modelo a corto plazo

de forma periódica en aquellos pacientes que acuden frecuentemente a la

consulta médica y el de largo plazo para aquéllos que no.

Implicaciones para la práctica clínica y la investigación

Para la primera mejora propuesta, se plantea en las escalas de riesgo

cardiovascular ya construidas, el cálculo de la probabilidad de evento en

periodos de tiempo inferiores, ya que estos datos estarán disponibles en la

base de datos construida para la escala predictiva a 10 años. De la misma

forma, podríamos aplicar esta técnica en nuevas escalas de riesgo

cardiovascular.

Seguidamente, para la segunda mejora propuesta (combinaciones), es obvio

que las escalas cardiovasculares construidas deberían de aplicar esta nueva

técnica para su utilización. En otras palabras, no es algo inmediato y podría

cambiar los valores del modelo. En consecuencia, se plantea la utilización de

esta técnica para la construcción de futuras escalas de riesgo cardiovascular.

Finalmente, en el método sugerido más complejo, hemos de tener en cuenta

que la obtención de simulaciones de los parámetros longitudinales no es una


59

tarea sencilla y conlleva un coste computacional de aproximadamente un

minuto si implementamos un total de 100 en un ordenador de usuario. Por otra

parte, los valores históricos de los parámetros longitudinales están registrados

en la historia clínica, generalmente electrónica (Palazón-Bru et al., 2014).

Partiendo de estas ideas, toda la información necesaria para aplicar nuestros

modelos se encuentra de forma informatizada, por lo que los algoritmos

implementados pueden adaptarse al lenguaje de la base de datos donde se

encuentren registrados los valores de los factores de riesgo y de esta forma

todos los cálculos serían inmediatos para el profesional sanitario. En otras

palabras, que a través de pulsar un botón, en un tiempo reducido aparezcan

conjuntamente por pantalla el histograma ofrecido en la Figura 12, el sistema

de puntos teórico y el conjunto de valores de los factores de riesgo que

determinan la mediana de las puntuaciones. Por otra parte, a la hora de

intervenir, se indicará el tiempo de la intervención y los posibles valores que

estima el profesional sanitario sobre su nuevo paciente. Al introducir esta nueva

información se reflejarán los dos histogramas conjuntamente (Figuras 12 y 13),

lo cual permitirá visualmente al clínico conocer el beneficio de su intervención.

Dado que el algoritmo se ha desarrollado sobre un conjunto de datos

simulados, animamos a otros autores que tengan bases de datos

cardiovasculares de la forma planteada, la implementación de un modelo de las

características detalladas en esta tesis doctoral. De forma que si sobre datos

reales se obtiene una mayor precisión en las predicciones, podremos aplicar

esta metodología para obtener el mejor pronóstico y tomar las decisiones a

corto plazo óptimas de cara al beneficio del paciente. No obstante, hemos de

tener en cuenta que el método propuesto se basa en la combinación de

modelos matemáticos utilizados en la literatura científica, es decir, nuestra

técnica a nivel teórico es completamente correcta, pues hemos sido muy

rigurosos en cada uno de los pasos a seguir. A nivel práctico podemos

determinar el valor de �� y la complejidad de los modelos para poder aplicar el

método propuesto. Finalmente, destacamos que el algoritmo desarrollado en

este trabajo, puede aplicarse a otras enfermedades o en otras áreas del

conocimiento, como la economía.


60

1.8. Referencias.

American Diabetes Association. 2014. Standards of medical care in diabetes--

2014. Diabetes Care 37 Suppl 1:S14-80.

Andersen PK, Gill RD. 1982. Cox's Regression Model for Counting Processes:

A Large Sample Study. Annals of Statistics 10: 1100-1120.

Artigao-Ródenas LM, Carbayo-Herencia JA, Palazón-Bru A, Divisón-Garrote

JA, Sanchis-Domènech C, Vigo-Aguiar I, Gil-Guillén VF; on behalf of GEVA.

2015. Construction and validation of a 14-year cardiovascular risk score for use

in the general population: the PURAS-GEVA chart. Medicine (Baltimore) 94:

e1980.

Brown ER, Ibrahim JG, DeGruttola V. 2005. A flexible B-spline model for

multiple longitudinal biomarkers and survival. Biometrics 61: 64-73.

Conroy RM, Pyörälä K, Fitzgerald AP, Sans S, Menotti A, De Backer G, De

Bacquer D, Ducimetière P, Jousilahti P, Keil U, Njølstad I, Oganov RG,

Thomsen T, Tunstall-Pedoe H, Tverdal A, Wedel H, Whincup P, Wilhelmsen L,

Graham IM; SCORE project group. 2003. Estimation of ten-year risk of fatal

cardiovascular disease in Europe: the SCORE project. European Heart Journal

24: 987-1003.

Cooney MT, Dudina AL, Graham IM. 2009. Value and limitations of existing

scores for the assessment of cardiovascular risk: a review for clinicians. Journal

of the American College of Cardiology 54: 1209-1227.


61

Faucett CL, Thomas DC. 1996. Simultaneously modelling censored survival

data and repeatedly measured covariates: a Gibbs sampling approach.

Statistics in Medicine 15: 1663-1685.

Food and Agriculture Organization of the United Nations and World Health

Organization. 1973. FAO/WHO. Energy and protein requirements: Report of a

joint FAO/WHO ad hoc expert committee. FAO Nutrition Meetings Report

Series No. 52. WHO Technical Report Series No. 522. Rome and Geneva:

FAO/WHO.

Henderson R, Diggle P, Dobson A. 2000. Joint modelling of longitudinal

measurements and event time data. Biostatistics 1: 465-480.

Hosmer DW, Lemeshow S. 2000. Applied Logistic Regression. New York:

Wiley.

Hosmer DW, Lemeshow S, May S. 2008. Applied Survival Analysis: Regression

Modeling of Time-to-Event Data. New York: Wiley.

James PA, Oparil S, Carter BL, Cushman WC, Dennison-Himmelfarb C,

Handler J, Lackland DT, LeFevre ML, MacKenzie TD, Ogedegbe O, Smith SC

Jr, Svetkey LP, Taler SJ, Townsend RR, Wright JT Jr, Narva AS, Ortiz E. 2014.

2014 evidence-based guideline for the management of high blood pressure in

adults: report from the panel members appointed to the Eighth Joint National

Committee (JNC 8). JAMA 311: 507-520. Erratum in: JAMA 311: 1809.

Molinero LM. 2003. Modelos de riesgo cardiovascular. Estudio de Framingham.

Proyecto SCORE. Available at http://www.seh-lelha.org/pdf/modelries.pdf

(accessed July 2015).

http://www.seh-lelha.org/pdf/modelries.pdf


62

National Cholesterol Education Program (NCEP) Expert Panel on Detection,

Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment

Panel III). 2002. Third Report of the National Cholesterol Education Program

(NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood

Cholesterol in Adults (Adult Treatment Panel III) final report. Circulation 106:

3143-3421.

National Heart, Lung, and Blood Institute (Boston University). 2015. The

Framingham Heart Study. Available at http://www.framinghamheartstudy.org/

(accessed July 2015).

Lloyd-Jones DM. 2010. Cardiovascular risk prediction: basic concepts, current

status, and future directions. Circulation 121: 1768-1777.

López-Bru D, Palazón-Bru A, Folgado-de la Rosa DM, Gil-Guillén VF. 2015.

Scoring System for Mortality in Patients Diagnosed with and Treated Surgically

for Differentiated Thyroid Carcinoma with a 20-Year Follow-Up. PLoS One 10:

e0128620.

Palazón-Bru A, Carbayo-Herencia JA, Vigo MI, Gil-Guillén VF. 2016. A method

to construct a points system to predict cardiovascular disease considering

repeated measures of risk factors. PeerJ 4: e1673.

Palazón-Bru A, Gil-Guillén VF, Orozco-Beltrán D, Pallarés-Carratalá V, Valls-

Roca F, Sanchís-Domenech C, Martín-Moreno JM, Redón J, Navarro-Pérez J,

Fernández-Giménez A, Pérez-Navarro AM, Trillo JL, Usó R, Ruiz E. 2014. Is

the physician's behavior in dyslipidemia diagnosis in accordance with

guidelines? Cross-sectional ESCARVAL study. PLoS One 9:e91567.

http://www.framinghamheartstudy.org/


63

Pencina MJ, D'Agostino RB. 2004. Overall C as a measure of discrimination in

survival analysis: model specific population value and confidence interval

estimation. Statistics in Medicine 23: 2109-2123.

Perk J, De Backer G, Gohlke H, Graham I, Reiner Z, Verschuren M, Albus C,

Benlian P, Boysen G, Cifkova R, Deaton C, Ebrahim S, Fisher M, Germano G,

Hobbs R, Hoes A, Karadeniz S, Mezzani A, Prescott E, Ryden L, Scherer M,

Syvänne M, Scholte op Reimer WJ, Vrints C, Wood D, Zamorano JL, Zannad F;

European Association for Cardiovascular Prevention & Rehabilitation (EACPR);

ESC Committee for Practice Guidelines (CPG). 2012. European Guidelines on

cardiovascular disease prevention in clinical practice (version 2012). The Fifth

Joint Task Force of the European Society of Cardiology and Other Societies on

Cardiovascular Disease Prevention in Clinical Practice (constituted by

representatives of nine societies and by invited experts). European Heart

Journal 33: 1635-1701. Erratum in: European Heart Journal 33: 2126.

Quispe R, Bazo-Alvarez JC, Burroughs Peña MS, Poterico JA, Gilman RH,

Checkley W, Bernabé-Ortiz A, Huffman MD, Miranda JJ; PERU MIGRANT

Study; CRONICAS Cohort Study Group. 2015. Distribution of Short-Term and

Lifetime Predicted Risks of Cardiovascular Diseases in Peruvian Adults. Journal

of the American Heart Association 4: e002112.

Ramírez-Prado D, Palazón-Bru A, Folgado-de la Rosa DM, Carbonell-

Torregrosa MÁ, Martínez-Díaz AM, Martínez-St John DR, Gil-Guillén VF. 2015.

A four-year cardiovascular risk score for type 2 diabetic inpatients. PeerJ 3:

e984.

Rizopoulos D. 2011. Dynamic predictions and prospective accuracy in joint

models for longitudinal and time-to-event data. Biometrics 67: 819-829.


64

Rizopoulos D, Ghosh P. 2011. A Bayesian semiparametric multivariate joint

model for multiple longitudinal outcomes and a time-to-event. Statistics in

Medicine 30: 1366-1380.

Rizopoulos D. 2012. Joint Models for Longitudinal and Time-to-Event Data

With Applications in R. Boca Raton: CRC Press.

Stone NJ, Robinson JG, Lichtenstein AH, Bairey Merz CN, Blum CB, Eckel RH,

Goldberg AC, Gordon D, Levy D, Lloyd-Jones DM, McBride P, Schwartz JS,

Shero ST, Smith SC Jr, Watson K, Wilson PW; American College of

Cardiology/American Heart Association Task Force on Practice Guidelines.

2014. 2013 ACC/AHA guideline on the treatment of blood cholesterol to reduce

atherosclerotic cardiovascular risk in adults: a report of the American College of

Cardiology/American Heart Association Task Force on Practice Guidelines.

Journal of the American College of Cardiology 63(25 Pt B):2889-2934. Erratum

in: Journal of the American College of Cardiology 63(25 Pt B): 3024-3025.

Sullivan LM, Massaro JM, D'Agostino RB Sr. 2004. Presentation of multivariate

data for clinical use: The Framingham Study risk score functions. Statistics in

Medicine 23: 1631-1660. Review.

Verbeke G, Lesaffre E. 1997. The effect of misspecifying the random effects

distribution in linear mixed models for longitudinal data. Computational Statistics

and Data Analysis 23: 541-56.

Vonesh EF, Greene T, Schluchter MD. 2006. Shared parameter models for the

joint analysis of longitudinal data and event times. Statistics in Medicine 25:

143-163.


65

Wang Y, Taylor JMG. 2001. Jointly Modeling Longitudinal and Event Time Data

With Application to Acquired Immunodeficiency Syndrome. Journal of the

American Statistical Association 96: 895-905.

World Health Organization. 2014. The top 10 causes of death. Available at

http://www.who.int/mediacentre/factsheets/fs310/en/ (accessed July 2015).

Zeng D, Cai J. 2005. Simultaneous modelling of survival and longitudinal data

with an application to repeated quality of life measures. Lifetime Data Analysis

11: 151-174.

http://www.who.int/mediacentre/factsheets/fs310/en/


66


67

2. TRABAJOS PUBLICADOS.


68

icine®

ONAL STUDY

MedOBSERVATI

Construction and Validation of a 14-Year CardiovascularRisk Score for Use in the General Population:

The Puras-GEVA Chart

Luis Miguel Artigao-Rodenas, MD, PhD, Julio Antonio Carbayo-Herencia, MD, PhD,
ntonio Divison-Garr
, P
Antonio Palazon-Bru, PhD, Juan A
Carlos Sanchis-Domenech, MD
G
included sex, diabetes, left ventricular hypertrophy, occupational

physical activity, age, systolic blood pressure� heart rate, number of

cigarettes, and total cholesterol. Validation yielded a C-statistic of 0.886

The results of this workdecisions in the short,incidence of CVD in t

Editor: Efird Jimmy.Received: July 20, 2015; revised: September 10, 2015; accepted: October13, 2015.Zone III Primary Health Care Centre, Health Service of Castilla-LaMancha, Albacete (LMA-R); San Antonio Catholic University, Murcia(JAC-H, JAD-G); Department of Clinical Medicine, Miguel HernandezUniversity, San Juan de Alicante (JAC-H, AP-B, VFG-G); Research Unit,Elda General Hospital, Elda (AP-B, VFG-G); Casas Ibanez Primary HealthCare Centre, Health Service of Castilla-La Mancha, Albacete (JAD-G);Health Center of Algemesi, Generalitat Valenciana, Algemesi, Valencia(CS-D); and Department of Applied Mathematics, University of Alicante,Alicante, Spain (IV-A).Correspondence: Antonio Palazon-Bru, PhD, Department of Clinical

Medicine, Miguel Hernandez University, San Juan de Alicante,03550, Alicante, Spain (e-mail: [email protected]).

This study has been partially funded by: 1) The Community Board ofCastilla-La Mancha, Regional Ministry of Health and Social Affairs(Order of July 3rd, 1992 and Order of September 14th, 1993, bothpublished in Diario Oficial de Castilla-La Mancha, DOCM); 2) Grantfrom the Foundation for Health Research in Castilla-La Mancha (FIS-CAM), file number 03069–00.

The authors have no conflicts of interest to disclose.Copyright # 2015 Wolters Kluwer Health, Inc. All rights reserved.This is an open access article distributed under the Creative CommonsAttribution-NonCommercial-NoDerivatives License 4.0, where it ispermissible to download, share and reproduce the work in any medium,provided it is properly cited. The work cannot be changed in any way orused commercially.ISSN: 0025-7974DOI: 10.1097/MD.0000000000001980

Medicine � Volume 94, Number 47, November 2015

ote, MD, PhD,uiar, PhD,

and Vicente Francisco

Abstract: The current cardiovascular risk tables are based on a 10-year

period and therefore, do not allow for predictions in the short or medium

term. Thus, we are unable to take more aggressive therapeutic decisions

when this risk is very high.

To develop and validate a predictive model of cardiovascular

disease (CVD), to enable calculation of risk in the short, medium

and long term in the general population.

Cohort study with 14 years of follow-up (1992–2006) was obtained

through random sampling of 342,667 inhabitants in a Spanish region.

Main outcome: time-to-CVD. The sample was randomly divided

into 2 parts [823 (80%), construction; 227 (20%), validation]. A

stepwise Cox model was constructed to determine which variables at

baseline (age, sex, blood pressure, etc) were associated with CVD. The

model was adapted to a points system and risk groups based on

epidemiological criteria (sensitivity and specificity) were established.

The risk associated with each score was calculated every 2 years up to a

maximum of 14. The estimated model was validated by calculating the

C-statistic and comparison between observed and expected events.

In the construction sample, 76 patients experienced a CVD during

the follow-up (82 cases per 10,000 person-years). Factors in the model

hD, Isabel Vigo-Agil-Guillen, MD, PhD

and the comparison between expected and observed events was not

significant (P: 0.49–0.75).

We constructed and validated a scoring system able to determine,

with a very high discriminating power, which patients will develop a

CVD in the short, medium, and long term (maximum 14 years).

Validation studies are needed for the model constructed.

(Medicine 94(47):e1980)

Abbreviations: ABI = ankle brachial index, AUC = area under the

receiver operating characteristic curve, BMI = body mass index, CI

= confidence interval, CVD = cardiovascular diseases, DBP =

diastolic blood pressure, FBG = fasting blood glucose, HDL-c =

high-density lipoprotein cholesterol, NLR = negative likelihood

ratio, PLR = positive likelihood ratio, SBP = systolic blood

pressure, SCORE = Systematic Coronary Risk Evaluation, TC =

total cholesterol.

INTRODUCTION

C ardiovascular diseases (CVD) with underlying atheroscle-rosis are the leading cause of mortality in the world.1 As a

result, predictive models of CVD have been obtained to deter-mine which patients are more likely to develop a disease of thisnature and in turn to take action on modifiable factors todecrease the likelihood of CVD.2 Because these models comefrom complex mathematical expressions, they, however, havebeen transformed into points systems so that health pro-fessionals can calculate the probability of CVD by performingsimple sums to determine the benefits of a possible interventionon cardiovascular risk factors.3

As a reference, the Systematic COronary Risk Evaluationrisk chart (SCORE) is used in Europe whereas the United Statesuses the Framingham risk table.2 These risk charts are based ona follow-up of 10 years and thus do not allow for accuratepredictions in periods shorter (short or medium term) or longerthan 10 years (long term). This is a clinically important ques-tion, because if we know that a patient has a high probability ofdeveloping CVD in the short term (eg, 2 years), we should givemore intense drug and nonpharmacological therapy, becausefailing to do so may result in the patient experiencing a CVD.Given that we are unable to obtain short- to medium-termprobabilities from the current points systems, we conducted astudy in a Spanish region in which we constructed and validateda predictive model for CVD of the proposed features using apopulation-based cohort followed up for a period of 14 years.

provide a tool to help us make treatmentmedium, and long term to reduce the

he general population.

www.md-journal.com | 1

mailto:[email protected]

http://dx.doi.org/10.1097/MD.0000000000001980

METHODS

Study PopulationThe study population comprised inhabitants of the pro-

vince of Albacete (Spain) who were at least 18 years of age(adults). In 1991, this province consisted of 342,667 inhabitants,equivalent to 0.89% of the whole country.4

Study Design and ParticipantsThis was a population-based cohort study with a maximum

follow-up of 14 years. The cohort was recruited through a 3-stage sampling design of all the inhabitants in the province ofAlbacete registered in the 1991 census. The first phase consistedof a stratified sample based on groups according to the size ofthe population of the nucleus of residence (capital, 39.7%;>10,000, 23.3%; 2001–10,000, 20.8%; 501–2000, 14.5%;and <501, 1.7%); the second phase was based on a clustersampling of the municipalities contained in the above groups,and the final phase was a simple random sampling. The samplesize in each municipality was proportional to the size of itspopulation. This process (baseline) was conducted betweenFebruary 1, 1992 and September 5, 1994. Between July 13,2002 and December 2, 2006, a second visit to the studyparticipants (follow-up) was performed. For both processes,the selected patients were contacted by mail (up to 2 times) andby telephoning those who did not respond. Patients in secondarycardiovascular prevention were excluded from this study.

Variables and MeasurementsThe primary study variable was the time to the first

occurrence of CVD (time-to-event data). This (date of eventoccurrence) was obtained from the patient’s clinical documentsand was defined as presenting at least one of the followingconditions: angina of any kind, myocardial infarction, stroke,peripheral arterial disease of the lower limbs, or death fromCVD. Mortality (date and cause) was obtained through thepatient’s death certificate. Secondary variables were sex;personal history of hypertension, diabetes, and dyslipidemia;family history of coronary heart disease, left ventricular hyper-trophy; occupational physical activity (heavy or moderateactivity; light activity); age (years); body mass index (BMI)(kg/m2); systolic (SBP) and diastolic blood pressure (mm Hg);cigarettes per day; fasting blood glucose (mmol/L); total cho-lesterol (TC) (mmol/L); high-density lipoprotein cholesterol(mmol/L); triglycerides (mmol/L); fibrinogen (mmol/L); heartrate (bpm); and ankle brachial index (ABI). Moreover, theproduct of the SBP and heart rate was used.5

A patient was considered to have a personal history ofhypertension, dyslipidemia, or diabetes when there was dailydrug treatment or the patient responded affirmatively to thequestion: Have you been told by your doctor that you havediabetes, hypertension, or hypercholesterolemia? Sex, date ofbirth, number of cigarettes smoked (as in other studies,2 thisvariable had a value of 0 when the patient was either a non-smoker or a former smoker), and occupational physical activitywere obtained through interviews with the patient. The latterwas measured according to the Food and Agriculture Organiz-ation of the United Nations and the World Health Organization:light activity: activity associated with sitting at a desk or behinda counter with automated instruments; moderate activity: con-

Artigao-Rodenas et al

tinuous light physical activity, such as light work in industry orin agriculture out of season; intense activity: heavy work and attimes, energetic (agricultural production, mining, or steel

2 | www.md-journal.com

work). If a person was not working, they were classified asperforming light activity.6 In addition, when a relative (parents,children, and/or siblings) had suffered an event before 56 yearsof age this was considered to be a family history of ischemicheart disease. This variable was collected by interview withthe patient.

Left ventricular hypertrophy and heart rate were deter-mined through an electrocardiogram. The variables relating toblood tests were obtained after a minimum 12-hour fast. TheBMI was calculated by measuring the weight and height of thepatient with calibrated equipment and with the patients inunderwear and barefoot. Blood pressure was measured usinga sphygmomanometer and stethoscope following the pro-cedures regulated by the guidelines recommended in the con-sensus to control hypertension in Spain published in 1990 by theMinistry of Health and Consumer Affairs.7 Finally, the ABI wasmeasured through Doppler ultrasound equipment and a sphyg-momanometer.8

Sample SizeThe original recruitment of patients was intended to esti-

mate the prevalence of peripheral arterial disease.8 With thissample size, we determined how many patients had to berecruited in each stratum (see Study design and participants).The study estimating that prevalence finished and the patientswere followed in a new study to determine prognostic factors forCVD. Because the original sample size was not calculated forthis purpose, the accuracy of the sample was estimated with thenew objectives (to construct and validate a new diagnostic test):construction: A total of 823 people, of whom 76 had a CVD(9.23%), were included. Expecting to find a specificity of 75%and establishing a confidence level of 95%, the accuracy inestimating the specificity was 3.11%. Validation: in this sample,26 of the 227 had a CVD. Assuming an area under the receiveroperating characteristic curve (AUC) of 0.90 and establishing aconfidence level of 95% to contrast an AUC different from 0.5, apower of contrast close to 100% was obtained.9

Statistical MethodsGeneral: absolute and relative frequencies were used to

describe the qualitative variables, whereas for quantitativevariables means and standard deviations were used. All analyseswere performed with a¼ 5% and for each relevant parameter,its associated confidence interval (CI) was calculated. Allanalyses were performed using IBM SPSS Statistics 19(IBM, Armonk, NY), Epidat 3.1 (Junta de Galicia, Galicia,Spain), Microsoft Office Excel 2007 (Microsoft, Redmond,Washington DC), and R 2.13.2.

Comparison of patients who completed the study and lostpatients: a comparison between these 2 groups was performedusing x

2 tests (Pearson or Fisher) and Student t test, according tothe type of each variable. Comparison of patients in the con-struction and validation samples: the patients who participatedin the study (no losses) were randomly divided into 2 groups: theconstruction sample (80% of the sample) and the validationsample (20%). To verify that there were no differences betweenthe 2 groups, the same tests as performed on the comparison ofthe losses were used.

Construction of the model: in the construction sample(80%), a multivariate Cox regression model was performed


to identify which variables were associated with CVD. For this,a forward stepwise algorithm based on the likelihood ratio testto determine those variables that may better predict CVD was

Copyright # 2015 Wolters Kluwer Health, Inc. All rights reserved.

TABLE 1. Descriptive Analysis of Patients Who Completed or Withdrew From the Study in Albacete (Spain), 1992 to 1994 Data

Variable

Completed theStudy n¼ 1050

n(%)/x� s

Withdrew Fromthe Study n¼ 192

n(%)/x� s P

SexMale 459 (49.7) 94 (49.0) 0.179Female 591 (56.3) 98 (51.0)

HypertensionYes 191 (18.0) 24 (12.5) 0.055No 859 (81.8) 168 (87.5)

DiabetesYes 78 (7.4) 10 (5.2) 0.270No 972 (92.6) 182 (94.8)

DyslipidemiaYes 146 (13.9) 15 (7.8) 0.021No 904 (86.1) 177 (92.2)

Family History of Coronary Heart DiseaseYes 108 (10.3) 19 (9.9) 0.870No 942 (89.7) 173 (90.1)

Left Ventricular HypertrophyYes 27 (2.6) 3 (1.6) 0.402No 1023 (97.4) 189 (98.4)

Occupational Physical ActivityHeavy or moderate 611 (58.2) 121 (63.0) 0.211Light 439 (41.8) 71 (37.0)Age (years) 47.4� 17.4 43.3� 18.2 0.003BMI (kg/m2) 27.5� 4.9 26.7� 4.5 0.046SBP (mm Hg) 132.6� 21.3 127.1� 22.8 0.001DBP (mm Hg) 81.6� 12.3 78.8� 13.3 0.004Cigarettes per day 5.6� 10.4 5.5� 9.0 0.919FBG (mmol/L) 5.6� 1.7 5.4� 1.0 0.186TC (mmol/L) 5.2� 1.0 5.0� 1.0 0.034HDL-c (mmol/L) 1.2� 0.3 1.2� 0.3 0.392Triglycerides (mmol/L) 1.2� 0.8 1.1� 0.7 0.081Fibrinogen (mmol/L) 0.099� 0.021 0.096� 0.019 0.071Heart rate (bpm) 73.5� 12.6 72.6� 12.5 0.363ABI 1.06� 0.13 1.08� 0.12 0.026

ABI¼ ankle brachial index, BMI¼ body mass index, DBP¼ diastolic blood pressure, FBG¼ fasting blood glucose, HDL-c¼ high-densitylipoprotein cholesterol, n (%)¼ absolute frequency (relative frequency), SBP¼ systolic blood pressure, TC¼ total cholesterol, x� s¼mean�

Medicine � Volume 94, Number 47, November 2015 The Puras-GEVA Chart

used. The goodness-of-fit of the model was assessed with thelikelihood ratio test and verification of the proportional riskswas evaluated using the Schoenfeld residuals method. Afterestimating the model parameters (b coefficients), through themethodology of the Framingham study,10 a points system wasconstructed taking into account the specific weight of eachvariable in the development of CVD (b coefficients). In otherwords, the b coefficients were adapted to a system that can beused by health professionals systematically and without requir-ing the use of electronic devices to perform the calculations,because categorizations of cardiovascular risk factors are madeto which a score is associated, according to the b coefficient ofsaid factor obtained in the multivariate model.10 The points andtheir associated risk were calculated every 2 years up to amaximum of 14 years. The following cutoff points were chosen:

standard deviation.

optimal point: the score that minimized the square root of (1-sensitivity)2þ (1-specificity)2; confirmation point: the mini-mum score value that had a positive likelihood ratio greater


than or equal to 10; discard point: the maximum score that had anegative likelihood ratio of less than 0.1. Points 2 and 3 werechosen because, according to the School of Evidence BasedMedicine, they are those that permit conclusive confirmation ordiscarding of positivity and negativity, respectively, of a diag-nostic test. Furthermore, CVD risk groups were formed usingthe points obtained previously: low risk (below Discard point),medium risk (from the discard point to the optimal point), highrisk (from the optimal point to the confirmation point), and veryhigh risk (greater than or equal to the confirmation point). Thismethodology has been used in similar studies.11,12

Validation: the points system was implemented in thevalidation sample (20%) and the AUC and the C-statistic werecalculated. In addition, the survival of the risk groups wascompared using the log-rank test and the survival curves were
plotted using the Kaplan–Meier technique. Finally, theobserved and expected events were plotted and compared every2 years (up to a maximum of 14 years) using the x2 test.

TABLE 2. Descriptive Analysis for Construction and Validation Samples in Albacete (Spain), 1992 to 1994 Data

VariableConstruction Sample

n¼ 823 n(%)/x� sValidation Samplen¼ 227 n(%)/x� s P

Cardiovascular DiseaseYes 76 (9.2) 26 (11.5) 0.318No 747 (90.8) 201 (88.5)

SexMale 355 (43.1) 104 (45.8) 0.471Female 468 (56.9) 123 (54.2)

HypertensionYes 156 (19.0) 35 (15.4) 0.221No 667 (81.0) 192 (84.6)

DiabetesYes 63 (7.7) 15 (6.6) 0.594No 760 (92.3) 212 (93.4)

DyslipidemiaYes 118 (14.3) 28 (12.3) 0.440No 705 (85.7) 199 (87.7)

Family History of Coronary Heart DiseaseYes 91(11.1) 17 (7.5) 0.117No 732(88.9) 210 (92.5)

Left Ventricular HypertrophyYes 24 (2.9) 3 (1.3) 0.179No 799 (97.1) 224 (98.7)

Occupational Physical ActivityHeavy or moderate 480 (58.3) 131 (57.7) 0.868Light 343 (41.7) 96 (42.3)Age (years) 47.4� 17.1 47.1� 18.0 0.794BMI (kg/m2) 27.6� 5.0 27.3� 4.8 0.490SBP (mm Hg) 133.3� 21.5 130.3� 20.7 0.061DBP (mm Hg) 81.7� 12.5 81.1� 11.7 0.473Cigarettes per day 5.6� 10.7 5.9� 9.5 0.663FBG (mmol/L) 5.6� 1.7 5.5� 1.7 0.666TC (mmol/L) 5.2� 1.0 5.2� 1.0 0.346HDL-c (mmol/L) 1.2� 0.3 1.2� 0.3 0.610Triglycerides (mmol/L) 1.2� 0.8 1.2� 0.7 0.769Fibrinogen (mmol/L) 0.099� 0.021 0.099� 0.020 0.699Heart rate (bpm) 73.6� 12.8 73.2� 12.0 0.599ABI 1.06� 0.13 1.07� 0.14 0.054

ABI¼ ankle brachial index, BMI¼ body mass index, DBP¼ diastolic blood pressure, FBG¼ fasting blood glucose, HDL-c¼ high densityy),

Artigao-Rodenas et al Medicine � Volume 94, Number 47, November 2015

Ethical IssuesThe study was approved by the Ethics Committee of the

General University Hospital of Albacete and carried out inaccordance with the ethical standards set forth in the Declarationof Helsinki of 1964 and its subsequent amendments. All patientsgave written informed consent before inclusion in the study.

RESULTSOf 1322 patients, 80 were excluded for having a history of

CVD, leaving a total of 1242 individuals who were followed untilthe development of a cardiovascular event. During the studyperiod, no information was obtained from 192 patients, thus theywere excluded. Table 1 shows the characteristics of those who

lipoprotein cholesterol, n (%)¼ absolute frequency (relative frequencstandard deviation.

remained in the study and those who dropped out, highlightingsome significant differences between the 2 samples (dyslipide-mia, age, BMI, SBP, diastolic blood pressure , TC, and ABI).


Once the sample was divided into 2 groups (80% and20%); no statistically significant differences were seen (P-values between 0.054 and 0.868). In these samples, the majoritywas women, there were patients with cardiovascular risk fac-tors, a majority of the patients carried out moderate or intenseactivity at work, and the average age was approximately 47years (see Table 2).

The construction sample consisted of 823 individuals, ofwhom 76 had a CVD during the follow-up (33 fatal), equivalentto an incidence density of 82 cases per 10,000 person-years(95% CI: 65–103; 36 fatal, 95% CI: 25–50). Regarding themultivariate model (Table 3), the following profile of prog-nostic factors for CVD was obtained: male sex, diabetes, leftventricular hypertrophy, light occupational physical activity,

SBP¼ systolic blood pressure, TC¼ total cholesterol, x� s¼mean�

older age, higher value of SBP� heart rate, higher number ofcigarettes per day, and higher values of TC. The overall modelwas highly significant (P< 0.001), that is, our model explained


value were obtained (Fig. 2). When comparing survival betweenthe different risk groups (Fig. 3), we observed that as the riskcategory increased, the probability of experiencing a CVD

TABLE 3. Multivariable-Adjusted Cox Proportional Hazards Regression Coefficients for 14-Year Risk of Cardiovascular Disease inAlbacete (Spain), 1992 to 1994 Data

Variable b Coefficient SE Adjusted HR 95% CI (Adjusted HR) P

Male sex 0.717 0.277 2.047 1.190–3.522 0.010Diabetes 0.600 0.296 1.823 1.021–3.254 0.042LVH 0.948 0.365 2.580 1.262–5.276 0.009Heavy or moderate occupational PA �0.312 0.253 0.732 0.446–1.201 0.216Age (years) 0.089 0.012 1.094 1.069–1.119 <0.001SBP heart rate (per 1000 mm Hg bpm) 0.055 0.045 1.057 0.968–1.154 0.220Cigarettes per day 0.016 0.013 1.016 0.991–1.042 0.214TC (mmol/L) 0.248 0.127 1.281 0.999–1.643 0.050

No TC measurement was available for 27 patients (3.28%). There were no missing data for the rest of the variables. The final model was thereforeconstructed with 796 patients.

Goodness-of-fit of the model: x2¼ 144.1, P< 0.001.hyp


the development of CVD better than the null model (model withno explanatory variables), and provided the following cutoffpoints: optimal, 10 (square root¼ 0.343); discard, 7 (negativelikelihood ratio¼ 0.07); and confirmation, 14 (positive like-lihood ratio¼ 10.56). The probabilities of each of the possiblescores, along with its associated risk every 2 years (up to a

CI¼ confidence interval, HR¼ hazard ratio, LVH¼ left ventricularstandard error, TC¼ total cholesterol.

maximum of 14 years) are reflected in Table 4. The scoringsystem constructed through the multivariate model is shown inFigure 1.

TABLE 4. Likelihood (%) of Having a Cardiovascular Diseaseby Score and Follow-Up Time in Albacete (Spain), 1992 to1994 Data

Score/Years 2 4 6 8 10 12 14

�1 0.01 0.02 0.03 0.04 0.06 0.09 0.110 0.01 0.03 0.04 0.07 0.09 0.15 0.171 0.02 0.04 0.07 0.11 0.15 0.23 0.272 0.03 0.07 0.10 0.17 0.23 0.36 0.423 0.05 0.11 0.16 0.27 0.36 0.57 0.664 0.07 0.17 0.25 0.42 0.56 0.88 1.035 0.11 0.26 0.39 0.65 0.88 1.38 1.616 0.18 0.41 0.62 1.02 1.38 2.15 2.517 0.28 0.64 0.96 1.59 2.14 3.34 3.898 0.43 0.99 1.50 2.47 3.33 5.18 6.029 0.67 1.55 2.34 3.84 5.16 7.98 9.2510 1.05 2.41 3.63 5.94 7.95 12.19 14.0811 1.64 3.75 5.62 9.13 12.15 18.40 21.1312 2.55 5.80 8.65 13.91 18.34 27.23 31.0113 3.96 8.92 13.20 20.89 27.16 39.18 44.0414 6.13 13.60 19.86 30.68 39.07 54.05 59.6715 9.42 20.43 29.26 43.62 53.93 70.36 75.8316 14.33 30.05 41.81 59.19 70.24 85.07 89.1517 21.49 42.82 57.12 75.38 84.97 94.89 96.9018 31.50 58.28 73.40 88.83 94.84 99.05 99.5619 44.66 74.52 87.39 96.75 99.03 99.93 99.9820 60.36 88.21 96.08 99.53 99.93 100.00 100.0021 76.48 96.47 99.37 99.98 100.00 100.00 100.00


As regards the validation sample, a C-statistic value of0.886 (standard error¼ 0.061) and an AUC with a very similar

ertrophy, PA¼ physical activity, SBP¼ systolic blood pressure, SE¼

FIGURE 1. Predictive model to determine which patients willsuffer a cardiovascular disease within a maximum period of 14years. Definition of occupational physical activity: light activity:activity associated with sitting at a desk or behind a counter withautomated instruments; moderate activity: continuous lightphysical activity, such as light work in industry or in agricultureout of season; intense activity: heavy work and, at times, energetic(agricultural production, mining, or steel work). If a person wasnot working, he or she was classified as performing light activity.LVH¼ left ventricular hypertrophy, PA¼physical activity, SBP¼systolic blood pressure


FIGURE 2. Area under the receiver operatingcharacteristic curve ofthe points system in the validation sample. AUC¼ area under the


increased very significantly (P< 0.001). Finally, comparisonbetween the expected and observed events in all risk groupsevery 2 years (Fig. 4) showed no statistically significant differ-ences (P-values between 0.49 and 0.75).

DISCUSSION

SummaryThis study constructed and validated a cardiovascular risk

table with primary data to determine the risk every 2 years, up to

receiver operating characteristic curve, CI¼ confidence interval.

a maximum of 14 years. The results enable us to make decisionsin the short, medium, and long term to prevent CVD in anew patient.

FIGURE 3. Differences between the different risk groups con-structed in the validation sample.


Strengths and Limitations of the StudyThe main strength of this study is the methodology fol-

lowed in constructing this cardiovascular risk scale because,unlike other risk tables,2 it enables us to make decisions in theshort and medium term. Thus, if a new patient has a highprobability of developing CVD in the short term, we shouldcarry out more intensive treatment of the factors in our pointssystem on which it is possible to act (TC, SBP, and smoking).Moreover, we have included new factors that do not appear inthe cardiovascular risk tables used.2 Finally, regarding statisti-cal issues, a very high discriminatory power was obtained.

To minimize possible selection bias, we used a randomsample design that considered the whole population in theprovince we were analyzing. Information bias was minimizedbecause calibrated and validated equipment was used, and greatcare was taken when measuring all parameters. In addition, allthe medical records and hospital reports of each participantwere comprehensively reviewed. Finally, although we collectedour data in the early 1990s (baseline) and early 2000s (secondvisit), we must also consider that the data collection for both theFramingham Heart Study and the SCORE project was per-formed before our study and these tables are still being usedtoday in daily clinical practice around the world.2

Regarding statistical issues, the sample size used wassmaller than in other risk scales.2 We, however, must bear inmind that this size was sufficient for the proposed objectives(validation power close to 100%), which, added to a sampledesign of the defined characteristics, provided great validity toour findings. The constructed model obtained factors that werenonsignificant independently (each factor separately). Whenconstructing the model, the choice of variables, however, wasperformed using a forward stepwise algorithm based on thelikelihood ratio test; thus, when these factors were entered intothe model they did reach statistical significance (P< 0.05). Wemust also highlight that we are considering the totality of theconstructed model,13 that is, its discriminatory capacity and theobtaining of results similar to reality (observed–expected com-parison), which as a whole was very satisfactory (C-statistic andAUC close to 90%, and no differences found between observedand expected events).

Comparison With the Existing LiteratureOur methodology has differences when compared with the

models mainly used in Europe and United States (SCORE andFramingham). First, we must take into account that thesemodels only allow for long-term predictions, whereas oursenables decision making in the short, medium, and long term.The type of sampling used in this study was completely random,whereas the Framingham study participants were volunteersand in SCORE, there were some patients from working popu-lation cohorts. Finally, the age range of the Framingham andSCORE studies is more restricted, whereas our study allows theconstructed scale to be applied to all adults (�18 years). TheFramingham and SCORE studies, however, have a minimumand maximum patient age for their use.2

The factors found in our model are consistent with thecurrent literature2 except that the product of the heart rate andSBP is used in a novel way in our model. As this variable hasshown its weight in predicting CVD,5 its weight is logical andexpected when determining which patients will develop CVD.


Secondly, work activity other than sitting at a desk or behind acounter with automated instruments had a protective nature inour outcome. Given that other studies have found that exercise


n th


has this protective character and adherence to exercise is notpoor (because otherwise the patient could not perform theirwork properly), it makes sense to find this factor in our risk scaleas well.14 Finally, quantifying smoking with the number ofcigarettes enables setting targets in patients who are smokers sothat a partial reduction of the habit could lead to a decrease incardiovascular risk. As this factor has only been assessed in abinary form in the SCORE and Framingham studies, a partialreduction in the number of cigarettes in the patient who smokesdoes not produce a decreased risk.2

Assessing the discriminating power of the model in deter-mining which patients will suffer a CVD gave an AUC of 0.90and a C-statistic very similar to this value. The other scalesobtained a maximum value of 0.82 in their internal validation.2

Hence, if we apply our model in other geographical areas andobtain a value similar to that found in our validation sample, ourrisk scale could become an alternative to the scales of previousstudies. Nonetheless, because heart rate and work activity mustbe included, these variables should be taken into account in thevalidation of our predictive model on data from cohort studiesconducted in other geographical areas.

Implications for Research and PracticeThe preparation and internal validation of this new cardi-

ovascular risk scale with a higher number of discriminatingpower than those currently known indicates that, if similarresults are obtained in other populations, our predictive modelcould become a reference when calculating the cardiovascularrisk in the general population. We must be cautious though, weonly applied this scale in the province of Albacete. The authorspropose the validation of this predictive model in other popu-lations. If this validation achieves results similar to ours, we canmake decisions in the short, medium, and long term, agreeingupon realistic targets with smokers (partial reduction), control-ling heart rate, and keeping in mind the work activity of theparticipant, in addition to other known and treated risk factors inpreventing CVD.

CONCLUSIONSThis study developed and validated a points system able to

determine, with a very high discriminating power, which

FIGURE 4. Comparison between observed and expected events i

patients will develop a CVD within a maximum period of 14years. This discriminating power was higher than in the knownscales. Therefore, if these results are maintained in validation


studies in other geographical areas, the cardiovascular risk scaleprepared in this study may be proposed as a tool for use inclinical practice to reduce the incidence of CVD in thegeneral population.

ACKNOWLEDGMENTS

In memory of Angel Puras Tellaeche (y) who was Chairmanof the Vascular Diseases Group of Albacete (GEVA) and was itsdriving force. With extensive knowledge in cardiovascularepidemiology, he was able to convey to the group the necessarymethodological rigor for the study of CVD and its risk factors.The authors thank Maria Repice and Ian Johnstone for theirassistance in the English version of the text and theirhelpful comments.

GEVA(Vascular Diseases Group of Albacete): Ponce-Gar-cia I, Simarro-Rueda M, Carbayo-Herencia JA, Divison-Gar-rote JA, Artigao-Rodenas LM

�, Gil-Guillen V, Sanchis-

Domenech C, Masso-Orozco J, Torres-Moreno P, Navarro-Sanchez L, Gonzalez-Lozano B, Martinez-Ramirez M, Marti-nez-Navarro E, Rodriguez-Panos B, Garcia-Gosalvez F,Molina-Escribano F, Martinez-Lopez R, Lopez-Abril J, Calde-villa-Bernardo D, Lopez-de-Coca-y-Fernandez-Valencia E,Argandona-Palacios E, Monedero-La Orden J, and Cam-payo-Serrano A.

�Lead author [email protected].

REFERENCES

1. World Health Organization. The Top 10 Causes of Death. 2014.

http://www.who.int/mediacentre/factsheets/fs310/en/Updated May

Accessed May, 2015

2. Cooney MT, Dudina AL, Graham IM. Value and limitations of

existing scores for the assessment of cardiovascular risk: a review

for clinicians. J Am Coll Cardiol. 2009;54:1209–1227.

3. Sullivan LM, Massaro JM, D’Agostino RB Sr. Presentation of

multivariate data for clinical use: the Framingham Study risk score

functions. Stat Med. 2004;23:1631–1660Review.

4. Statistics National Institute. Population figures and Demographic

Censuses. Population and Housing Census 1991, 1991. http://

www.ine.es/jaxi/

menu.do?type¼pcaxis&path¼%2Ft20%2Fe243&file¼inebase&L¼0\

Updated January, 1991. Accessed May, 2015

e validation sample.

5. Inoue R, Ohkubo T, Kikuya M, et al. Predictive value for mortality

of the double product at rest obtained by home blood pressure

measurement: the Ohasama study. Am J Hypertens. 2012;25:

568–575.


6. Food and Agriculture Organization of the United Nations and World

Health Organization. Energy and Protein Requirements: Report of a

Joint FAO/WHO ad Hoc Expert Committee. FAO Nutrition Meet-

ings Report Series No. 52. WHO Technical Report Series No. 522.

Rome, Italy and Geneva, Switzerland: FAO/WHO; 1973.

7. Ministry of Health. Consensus for the Control of Hypertension in

Spain. Ministry of Health, Madrid, Spain

8. Carbayo JA, Divison JA, Escribano J, et al., Grupo de Enfermedades

Vasculares de Albacete (GEVA). Using ankle-brachial index to detect

peripheral arterial disease: prevalence and associated risk factors in a

random population sample. Nutr Metab Cardiovasc Dis. 2007;17:41–49.

9. Hanley JA, McNeil BJ. The meaning and use of the area under a

receiver operating characteristic (ROC) curve. Radiology.

1982;143:29–36.


10. Sullivan LM, Massaro JM, D’Agostino RB Sr. Presentation of

multivariate data for clinical use: the Framingham Study risk score

functions. Stat Med. 2004;23:1631–1660.


11. Ramirez-Prado D, Palazon-Bru A, Folgado-de-la Rosa DM, et al.

Predictive models for all-cause and cardiovascular mortality in type

2 diabetic inpatients. A cohort study. Int J Clin Pract. 2015;69:

474–484.

12. Palazon-Bru A, Martinez-Orozco MJ, Perseguer-Torregrosa Z,

et al. Construction and validation of a model to predict

nonadherence to guidelines for prescribing antiplatelet

therapy to hypertensive patients. Curr Med Res Opin.

2015;31:883–889.

13. Lopez-Bru D, Palazon-Bru A, Folgado-de la Rosa DM,

et al. Scoring system for mortality in patients diagnosed

with and treated surgically for differentiated thyroid

carcinoma with a 20-year follow-up. PLoS One. 2015;10:

e0128620.


14. Fogelholm M. Physical activity, fitness and fatness: relations to

mortality, morbidity and disease risk factors. A systematic review.

Obes Rev. 2010;11:202–221.


Submitted 8 February 2015Accepted 8 May 2015Published 2 June 2015

Corresponding authorAntonio Palazon-Bru,[email protected]

Academic editorDaniela Foti

Additional Information andDeclarations can be found onpage 8

DOI 10.7717/peerj.984

Copyright2015 Ramırez-Prado et al.

Distributed underCreative Commons CC-BY 4.0

OPEN ACCESS

A four-year cardiovascular risk score fortype 2 diabetic inpatientsDolores Ramırez-Prado1,2, Antonio Palazon-Bru1,2,David Manuel Folgado-de la Rosa2, Marıa Angeles Carbonell-Torregrosa3, Ana Marıa Martınez-Dıaz3, Damian RobertJames Martınez-St. John2 and Vicente Francisco Gil-Guillen1,2

1 Research Unit, Elda Hospital, Elda, Alicante, Spain2 Department of Clinical Medicine, Miguel Hernandez University, San Juan de Alicante, Alicante,

Spain3 Emergencies Unit, Elda Hospital, Elda, Alicante, Spain

ABSTRACTAs cardiovascular risk tables currently in use were constructed using data from thegeneral population, the cardiovascular risk of patients admitted via the hospitalemergency department may be underestimated. Accordingly, we constructed apredictive model for the appearance of cardiovascular diseases in patients withtype 2 diabetes admitted via the emergency department. We undertook a four-yearfollow-up of a cohort of 112 adult patients with type 2 diabetes admitted via theemergency department for any cause except patients admitted with acute myocardialinfarction, stroke, cancer, or a palliative status. The sample was selected randomlybetween 2010 and 2012. The primary outcome was time to cardiovascular disease.Other variables (at baseline) were gender, age, heart failure, renal failure, depression,asthma/chronic obstructive pulmonary disease, hypertension, dyslipidaemia, insulin,smoking, admission for cardiovascular causes, pills per day, walking habit, fastingblood glucose and creatinine. A cardiovascular risk table was constructed basedon the score to estimate the likelihood of cardiovascular disease. Risk groups wereestablished and the c-statistic was calculated. Over a mean follow-up of 2.31 years,39 patients had cardiovascular disease (34.8%, 95% CI [26.0–43.6%]). Predictivefactors were gender, age, hypertension, renal failure, insulin, admission due tocardiovascular reasons and walking habit. The c-statistic was 0.734 (standard error:0.049). After validation, this study will provide a tool for the primary health careservices to enable the short-term prediction of cardiovascular disease after hospitaldischarge in patients with type 2 diabetes admitted via the emergency department.

Subjects Diabetes and Endocrinology, Epidemiology, Public HealthKeywords Primary Health Care, Emergencies, Diabetes mellitus, Cardiovascular diseases,Predictive models

INTRODUCTIONCardiovascular diseases (CVD) constitute one of the main causes of death worldwide,

and one of the main reasons for admission via the hospital emergency department (ED)

(Fan et al., 2011; World Health Organization, 2014). The most important risk factors

for CVD include diabetes mellitus, hypertension, dyslipidaemia, obesity and smoking

How to cite this article Ramırez-Prado et al. (2015), A four-year cardiovascular risk score for type 2 diabetic inpatients. PeerJ 3:e984;DOI 10.7717/peerj.984


https://peerj.com/academic-boards/editors/


http://dx.doi.org/10.7717/peerj.984


http://creativecommons.org/licenses/by/4.0/


https://peerj.com


(World Health Organization, 2007). These factors are all prevalent among patients admitted

via the ED (Cinza Sanjurjo et al., 2006; Fan et al., 2011).

One in every six ED admissions among diabetic patients is related to the diabetes itself,

with almost half of these admissions due to glycaemic decompensation. The other main

reasons (unrelated to the diabetes) for ED admissions among these patients are lesions and

poisonings (Hinojosa Mena-Bernal et al., 2004).

We are unaware of any studies in patients with type 2 diabetes admitted via the ED

that have analyzed the onset of CVD and constructed a predictive model to indicate

which of these patients have a greater likelihood of presenting CVD. Although there

exist cardiovascular risk tables constructed with data from the general population,

health centres, working persons and volunteers, the results of these tables are not based

on the follow-up of patients with specific disorders, such as type 2 diabetes (Cooney,

Dudina & Graham, 2009). Thus, the cardiovascular risk obtained from these tables

might be underestimated, as we must consider that diabetic persons admitted via the

ED present important differences (highly heterogenic) with the type of patients used for

the construction of these scales and tables. For example, diabetic patients admitted via

the ED have, a priori, more disorders. Accordingly, we undertook a study with a four-year

follow-up at the Elda Hospital (Spain) to construct a predictive model of CVD. Once

validated (by reproducing our results in other populations) and after hospital discharge,

this model could be used preventively by the primary health care services with the aim

of reducing the cardiovascular mortality and morbidity in patients with type 2 diabetes

admitted via the ED.

MATERIALS & METHODSStudy population, design and participants, ethical considerationsThe study population was formed by diabetic patients admitted via the ED in the Valle de

Elda healthcare area (Valencian community), an industrial area with 198,090 inhabitants

with a low-to-medium socioeconomic level (Martınez-Orozco et al., 2015). The ED of

Elda Hospital (a public institution) tends to about 160 general emergency cases daily

among the adult population, not including obstetric and gynaecologic cases (Carbonell

Torregrosa et al., 2014).

The study cohort comprised type 2 diabetic patients admitted for any reason via the ED

of Elda Hospital (only hospital in the healthcare area), aged >13 years (patients younger

than 13 years are seen by the paediatric services), who were willing to participate. The

follow-up was four years. Patients were excluded if they were pregnant or had a personal

history of acute myocardial infarction, stroke, or cancer, or were receiving palliative care. A

random sample was selected from all patients admitted via the ED between January 2010

and March 2012. The sampling procedure involved random selection of one day every

week and recruiting all the diabetic patients who fulfilled the criteria and were admitted on

that day.

Patients with type 1 diabetes were not included in this study because they generally have

different characteristics to patients with type 2 diabetes; for example, patients with type 1

Ramırez-Prado et al. (2015), PeerJ, DOI 10.7717/peerj.984 2/10

https://peerj.com


diabetes are usually younger. Grouping together two non-homogenous groups of patients

would produce results that would not really be useful in daily clinical practice. For this

reason, most authors usually analyze different outcomes depending on the type of diabetes

(Ramırez-Prado et al., 2015). Each patient was followed from the recruitment date until he

or she had a CVD, whether fatal or not. If no CVD developed, the patient was followed for

four years (if still alive), or until the date of last clinical contact (assuming the patient had

no CVD by this date).

The study posed no additional risk to the patients and an indirect benefit was expected,

as the results might reduce short-term cardiovascular morbidity and mortality in this

type of patient. The study was carried out in compliance with the principles of the World

Medical Association Declaration of Helsinki and complied with the European Union

norms of good clinical practice. The patients were informed verbally about the study and

about the information required. The study was approved by the Ethics Committee of the

Elda Department of Health (Ref. UI13016).

Variables and measurementsThe main outcome variable was cardiovascular morbidity or mortality during the

four-year follow-up. Cardiovascular conditions were considered to be those affecting

the heart or blood vessels (cerebrovascular, legs, kidneys or heart) (Bonny et al., 2008).

Data collected at admission (baseline) included gender, age (years), personal history of

diseases (heart failure, renal failure, depression, asthma/chronic obstructive pulmonary

disease (COPD), hypertension and dyslipidaemia), use of insulin, smoking, admission

due to cardiovascular reasons, number of tablets per day (usual medication for whatever

condition, excluding diabetes therapy), walking habit, fasting blood glucose (mmol/L) and

creatinine (µmol/L).

Cardiovascular morbidity and mortality was assessed during the four-year follow-up by

regularly checking the hospital and health centre records. In the case of any doubt about

death, contact was made with the patient (if alive) or the patient’s relatives, or by contacting

the patient’s assigned physician (if there was still doubt). Information about gender, age,

personal history of diseases, smoking, taking of insulin, and number of pills daily was ob-

tained by patient interview and corroborated from the medical records. Information about

walking habits was obtained just at the interview. Data regarding admissions were obtained

from the hospital records. The baseline fasting blood glucose and the creatinine were mea-

sured according to the current clinical guidelines (American Diabetes Association, 2014).

Sample size and statistical methodsThe final cohort sample was 112 patients. Assuming 95% confidence, an expected censored

proportion of 60%, an exposure proportion of 35% and an expected hazard ratio (HR) of

2.50, the power to contrast a HR different to 1 was calculated. The resulting value, obtained

from implementing the formula for the power in an Excel spreadsheet and solving it with

the Solver tool, was 83.28%.

As smoking and walking had lost values, 32.4% and 24.1% respectively, 100 multiple

imputations were made beforehand using logistic regression switching with predictive


https://peerj.com


mean matching. This is the most suitable procedure when the number of missing data is

between 10–50%. In this way we were able to work with all the variables (Marshall, Altman

& Holder, 2010).

Absolute and relative frequencies were used to describe the qualitative variables, with

means and standard deviations for the quantitative variables. A Cox multivariate regres-

sion model was constructed to determine which variables were associated with cardiovas-

cular morbidity and mortality, calculating the HR. As we had few patients, we selected a

maximum number of explanatory variables in the model. As a heuristic rule we considered

there needed to be at least 10 observations of morbidity and mortality or no morbidity and

mortality for each explanatory variable. To obtain the variables in the model we analyzed

all the possible combinations with a maximum of 7 variables (16,383), calculating the value

of the c-statistic in all of them. The combination with the highest value was then selected.

The c-statistic is similar to the area under the ROC curve, but the former takes into account

censoring. The goodness of fit of the model was assessed by the score (log-rank) test. Using

the β coefficients of the multivariate model a risk table was constructed based on the sum

of the points to estimate the likelihood of CVD (Sullivan, Massaro & D’Agostino, 2004).

After calculating the scores and their associated risk, risk groups were designed: low risk

(<5th percentile), medium risk (from the 5th percentile to the median), high risk (from the

medianto the 95th percentile), and very high risk (≥ 95th percentile). All the analyses were

done with an α = 5% and for each relevant parameter the associated confidence interval

(CI) was calculated. All the analyses were done with IBM SPSS Statistics 19 and R 2.13.2.

RESULTSOf a total of 115 patients who fulfilled the inclusion criteria, three were excluded because

there was no further contact after the initial visit (lost during the follow-up). Thus, the final

sample comprised 112 patients.

Over a mean follow-up of 2.3 ± 1.6 years, 39 of the 112 patients had CVD (34.8%, 95%

CI [26.0–43.6%]). Of these, 22 were fatal (19.6%, 95% CI [12.3–27.0%]) (cardiac arrest,

12; ischaemic heart disease, 3; heart failure, 3; stroke, 3; peripheral arterial disease, 1) and

17 were non fatal (15.2%, 95% CI [8.5–21.8%]) (ischaemic heart disease, 7; heart failure, 6;

atrial fibrillation, 2; renal failure, 1; pericarditis, 1) (Table 1). This represents an incidence

density of 150 CVD for each 1,000 person-years (95% CI [107–206] CVD per 1,000

person-years), of which 104 were fatal (95% CI [69–152] CVD per 1,000 person-years)

and 46 non fatal (95% CI [89–273] CVD per 1,000 person-years).

Table 1 shows the descriptive and analytical characteristics of the study patients. The

mean age was advanced (70.5 years); the youngest patient was 34 years old. There was

a high prevalence of comorbidity (heart failure, 13.4%; renal failure, 8.9%; depression,

8.9%; asthma/COPD, 13.4%; hypertension, 75.0%; dyslipidaemia, 42.9%) and a very high

mean number of daily pills (5.6). Concerning lifestyle habits, 21.4% of the patients smoked

and 26.8% walked usually. For the diabetes-related variables, 43.8% used insulin and the

mean baseline fasting blood glucose was 8.4 mmol/L. Notably, 26.8% of the patients were

admitted with a cardiovascular problem.


https://peerj.com


Table 1 Baseline characteristics and adjusted hazard ratios for cardiovascular disease for type 2 diabetic inpatients in a Spanish region,2010–2012 data.

Variable Total (n = 112)n(%)/x ± s

HR 95% CI p-value

Cardiovascular morbidity:

Ischemic heart disease 7(6.2)

Heart failure 6(5.4)

Atrial fibrillation 2(1.8)

Renal failure 1(0.9)

Pericarditis 1(0.9)

N/A N/A N/A

Cardiovascular mortality:

Cardiac arrest 12(10.7)

Ischemic heart disease 3(2.7)

Heart failure 3(2.7)

Cerebral haemorrhage 3(2.7)

Peripheral arterial disease 1(0.9)

N/A N/A N/A

Male gender 59(52.7) 1.84 0.90–3.75 0.095

Age (years) 70.5 ± 12.4 1.04 1.00–1.08 0.031

Depression 10(8.9) N/M N/M N/M

Asthma/COPD 15(13.4) N/M N/M N/M

Hypertension 84(75.0) 1.11 0.47–2.62 0.804

Dyslipidaemia 48(42.9) N/M N/M N/M

Heart failure 15(13.4) N/M N/M N/M

Renal failure 10(8.9) 2.76 1.01–7.59 0.048

Insulin 49(43.8) 1.56 0.77–3.16 0.212

Smoking 24(21.4) N/M N/M N/M

Admission for cardiovascular reasons 30(26.8) 2.15 1.09–4.25 0.027

Pills per day 5.6 ± 3.9 N/M N/M N/M

Habit of walking 30(26.8) 0.57 0.25–1.31 0.185

FBG (mmol/L) 8.4 ± 4.4 N/M N/M N/M

Creatinine (µmol/L) 97.2 ± 44.2 N/M N/M N/M

Notes.HR, hazard ratio; CI, confidence interval; COPD, chronic obstructive pulmonary disease; FBG, fasting blood glucose; N/A, not applicable; N/M, not in the model;Goodness-offit of the model: X2

= 24.43, p < 0.001; c-statistic, 0.734 (standard error: 0.049).

The HR of the variables included in the stepwise model were: male gender (HR = 1.84,

95% CI [0.90–3.75], p = 0.095), older age (per 1 year) (HR = 1.04, 95% CI [1.00–1.08],

p = 0.031), hypertension (HR = 1.11, 95% CI [0.47–2.62], p = 0.804), renal failure (HR

= 2.76, 95% CI [1.01–7.59], p = 0.048), insulin use (HR = 1.56, 95% CI [0.77–3.16],

p = 0.212), admission for cardiovascular reasons (HR = 2.15, 95% CI [1.09–4.25],

p = 0.027) and not having the habit of walking (HR = 0.57, 95% CI [0.25–1.31],

p = 0.185). The model obtained with these factors was very significant (p < 0.001). The

scores for each variable in the predictive model and the risk groups are shown in Fig. 1. The

c-statistic for the scoring system was 0.734 (standard error: 0.049).

Figure 2 shows that there were significant differences in survival between the various risk

groups (p < 0.001), with a reduction in survival as the risk category increased.


https://peerj.com


Figure 1 Four-year risk score for predicting cardiovascular disease in type 2 diabetic inpatients.

DISCUSSIONSummaryThis study constructed a predictive model for CVD with a good discriminating power

(c-statistic = 0.734) indicating which patients with type 2 diabetes who are admitted via

the ED have a greater risk of presenting CVD, either fatal or non fatal.

Strengths and limitationsThe main strength of this study is related to the lack of other studies that have constructed

short-term predictive models for CVD in patients with type 2 diabetes admitted via the ED.

The innovative results can therefore be used to help take decisions to try to avoid the onset

of CVD. Additionally, the predictive model constructed had a good discriminating power

(c-statistic = 0.734), which will enable precise predictions after validation.

Although the sample size was just 112 patients, it was still sufficient for the aims of this

study, as the idea was to evaluate the predictive model and its resulting c-statistic, which

indicates the discriminating power of the scale constructed (Cooney, Dudina & Graham,

2009). We were therefore very rigorous designing the model, selecting a maximum number

of variables with a stepwise procedure. The results obtained in the model indicate a high

degree of significance (p < 0.001 for the goodness of fit), accompanied by a c-statistic

above 70% (0.734). Furthermore, the contrast power in our sample size calculation was

83.28%.


https://peerj.com


Figure 2 Survival of the different risk groups for cardiovascular disease of type 2 diabetic inpatientsin a Spanish region.

To minimize the possible bias related to measurement and selection, calibrated devices

were used and a random sample was selected. However, we were unable to use certain

variables that are important in the development of CVD, e.g., obesity, years with diabetes,

HbA1c, because the emergency department protocol in our hospital does not include

their measurement. If they had been taken into account, then the c-statistic may well have

improved. Nevertheless, the resulting value without the inclusion of these variables was

satisfactory. Finally, part of the values related to the variable walking habit was obtained by

statistical imputation, though the procedure used is considered adequate for this type of

model (Marshall, Altman & Holder, 2010).

Comparison with existing literatureOthers have constructed cardiovascular risk models that have been extensively validated.

However, these models were based on the general population, or patients attending

their healthcare centre, working persons or volunteers. Our patients, though, formed

a heterogeneous group concerning prognostic factors for CVD among the populations

used to construct the existing models. A priori, they were all less stable and all had type

2 diabetes. These differences make comparison with current cardiovascular risk tables

very difficult. Nonetheless, the c-statistic for internal validation (0.734) is within the range

obtained by the other cardiovascular models (0.708–0.82). This indicates that, if our model


https://peerj.com


is validated externally with results similar to the internal validation, it could be used in

daily clinical practice (Cooney, Dudina & Graham, 2009).

The prognostic factors for CVD in our study were: insulin, older age, male gender, renal

failure, hypertension, habit of walking, and being admitted for cardiovascular reasons.

These results corroborate those of other authors, except for the initial admission due

to cardiovascular problems (Muggeo et al., 2000; Bo et al., 2005; Hong Kong Diabetes

Registry et al., 2008; Kleefstra et al., 2008; Cooney, Dudina & Graham, 2009), although

this association was very logical. Finally, smoking was notably absent in the predictive

model, possibly due to the already high underlying cardiovascular risk of these patients

(Gil-Guillen et al., 2009).

Implications for research and/or practiceAfter validation, this study could provide clinical practice with a tool to predict premature

cardiovascular morbidity and mortality in patients with type 2 diabetes admitted via the

ED. If our results are confirmed with other studies, those patients who have a high likeli-

hood of CVD within four years should be closely followed with effect from their hospital

discharge. The control should be based mainly on medication adjustment, control of

therapeutic non-compliance, and ensuring a healthy lifestyle (Ramırez-Prado et al., 2015).

This validation will require recruiting a new sample of patients and determining the two

key questions with this sample; firstly, whether the scoring system correctly discriminates

between those patients who have CVD and those who do not (using the c-statistic), and

secondly, whether the proportion of observed events is similar to that given by the model

(using X2 tests). This validation is currently the subject of study in our hospital, and

obviously it could also be done in other geographical areas, such that if the two previous

conditions are verified, a tool will be available to help reduce the incidence of CVD in

patients with similar characteristics to those of the present study sample.

CONCLUSIONSThis study provides a tool that, after validation, will enable short-term cardiovascular

morbidity and mortality to be predicted in patients with type 2 diabetes admitted via the

ED. This tool should be used by the primary health care services to improve the prognosis,

by making more suitable decisions and planning the beneficial needs of the patient, though

whenever possible indicating that the patient should walk and carrying out stricter control

in those patients who present a high cardiovascular risk.

ACKNOWLEDGEMENTSWe thank all the services of the General Hospital of Elda who participated in this study. The

authors also thank Ian Johnstone for help with the English language version of the text.

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThe authors declare there was no funding for this work.


https://peerj.com


Competing InterestsThe authors declare there are no competing interests.

Author Contributions• Dolores Ramırez-Prado conceived and designed the experiments, performed the

experiments, wrote the paper, reviewed drafts of the paper.

• Antonio Palazon-Bru conceived and designed the experiments, performed the exper-

iments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed

drafts of the paper.

• David Manuel Folgado-de la Rosa and Damian Robert James Martınez-St. John

conceived and designed the experiments, reviewed drafts of the paper.

• Marıa Angeles Carbonell-Torregrosa and Ana Marıa Martınez-Dıaz

conceived and designed the experiments, performed the experiments, contributed

reagents/materials/analysis tools, reviewed drafts of the paper.

• Vicente Francisco Gil-Guillen conceived and designed the experiments, contributed

reagents/materials/analysis tools, reviewed drafts of the paper.

Human EthicsThe following information was supplied relating to ethical approvals (i.e., approving body

and any reference numbers):

The study posed no additional risk to the patients and an indirect benefit was expected,

as the results might reduce short-term cardiovascular morbidity and mortality in this

type of patient. The study was carried out in compliance with the principles of the World

Medical Association Declaration of Helsinki and complied with the European Union

norms of good clinical practice. The patients were informed verbally about the study and

about the information required. The study was approved by the Ethics Committee of the

Elda Department of Health (Ref. UI13016).

Supplemental InformationSupplemental information for this article can be found online at http://dx.doi.org/

10.7717/peerj.984#supplemental-information.

REFERENCESAmerican Diabetes Association. 2014. Standards of medical care in diabetes–2014. Diabetes Care

37(Suppl 1):S14–S80 DOI 10.2337/dc14-S014.

Bo S, Ciccone G, Rosato R, Gancia R, Grassi G, Merletti F, Pagano GF. 2005. Renal damage inpatients with Type 2 diabetes: a strong predictor of mortality. Diabetic Medicine 22:258–265DOI 10.1111/j.1464-5491.2004.01394.x.

Bonny A, Lacombe F, Yitemben M, Discazeaux B, Donetti J, Fahri P, Megbemado R,Estampes B. 2008. The 2007 ESH/ESC guidelines for the management of arterial hypertension.Journal of Hypertension 26:825–826 DOI 10.1097/HJH.0b013e3282f857d7.


https://peerj.com

http://dx.doi.org/10.7717/peerj.984#supplemental-information












































http://dx.doi.org/10.2337/dc14-S014

http://dx.doi.org/10.1111/j.1464-5491.2004.01394.x

http://dx.doi.org/10.1097/HJH.0b013e3282f857d7


Carbonell Torregrosa MA, Urtubia Palacios A, Palazon Bru A, Carrasco Tortosa V,Gil Guillen V. 2014. Impacto de la implantacion del programa ASIGNA en un servicio deurgencias hospitalario. Emergencias 26:188–194.

Cinza Sanjurjo S, Cabarcos Ortiz de Barron A, Nieto Pol E, Torre Carballada JA. 2006.Prevalencia de hipertension arterial en poblacion mayor de 65 anos ingresada en un Servicio deMedicina Interna. Anales de Medicina Interna 23:577–581.

Cooney MT, Dudina AL, Graham IM. 2009. Value and limitations of existing scores for theassessment of cardiovascular risk: a review for clinicians. Journal of the American College ofCardiology 54:1209–1227 DOI 10.1016/j.jacc.2009.07.020.

Fan L, Shah MN, Veazie PJ, Friedman B. 2011. Factors associated with emergency department useamong the rural elderly. Journal of Rural Health 27:39–49DOI 10.1111/j.1748-0361.2010.00313.x.

Gil-Guillen VF, Merino-Sanchez J, Sanchez-Ruiz T, Amoros-Barber T, Aznar-Vicente J,Abellan-Aleman J, Llisterri-Caro JL, Orozco-Beltran D, Pascual Perez M,Marquez Contreras E. 2009. Valoracion del riesgo cardiovascular en la fase longitudinal delestudio Mediterranea. Revista Clınica Espanola 209:118–130DOI 10.1016/S0014-2565(09)70877-X.

Hinojosa Mena-Bernal MC, Gonzalez Sarmiento E, Hinojosa Mena-Bernal J,Zurro Hernandez J. 2004. Asistencia urgente del paciente diabetico en el area este de laprovincia de Valladolid. Anales de Medicina Interna 21:7–11.

Hong Kong Diabetes Registry, Yang X, So WY, Tong PC, Ma RC, Kong AP, Lam CW, Ho CS,Cockram CS, Ko GT, Chow CC, Wong VC, Chan JC. 2008. Development and validation ofan all-cause mortality risk score in type 2 diabetes. Archives of Internal Medicine 168:451–457DOI 10.1001/archinte.168.5.451.

Kleefstra N, Landman GW, Houweling ST, Ubink-Veltmaat LJ, Logtenberg SJ, Meyboom-deJong B, Coyne JC, Groenier KH, Bilo HJ. 2008. Prediction of mortality in type 2 diabetes fromhealth-related quality of life (ZODIAC-4). Diabetes Care 31:932–933 DOI 10.2337/dc07-2072.

Marshall A, Altman DG, Holder RL. 2010. Comparison of imputation methods for handlingmissing covariate data when fitting a Cox proportional hazards model: a resampling study.BMC Medical Research Methodology 10:112 DOI 10.1186/1471-2288-10-112.

Martınez-Orozco MJ, Perseguer-Torregrosa Z, Gil-Guillen VF, Palazon-Bru A,Orozco-Beltran D, Carratala-Munuera C. 2015. Suitability of antiplatelet therapy inhypertensive patients. Journal of Human Hypertension 29:40–45 DOI 10.1038/jhh.2014.25.

Muggeo M, Zoppini G, Bonora E, Brun E, Bonadonna RC, Moghetti P, Verlato G. 2000. Fastingplasma glucose variability predicts 10-year survival of type 2 diabetic patients: the VeronaDiabetes Study. Diabetes Care 23:45–50 DOI 10.2337/diacare.23.1.45.

Ramırez-Prado D, Palazon-Bru A, Folgado-de-la Rosa DM, Carbonell-Torregrosa MA,Martınez-Dıaz AM, Gil-Guillen VF. 2015. Predictive models for all-cause and cardiovascularmortality in type 2 diabetic inpatients. A cohort study. International Journal of Clinical Practice69:474–484 DOI 10.1111/ijcp.12563.

Sullivan LM, Massaro JM, D’Agostino Sr RB. 2004. Presentation of multivariate data forclinical use: the Framingham Study risk score functions. Statistics in Medicine 23:1631–1660DOI 10.1002/sim.1742.

World Health Organization. 2007. Prevention of cardiovascular disease. Available at http://whqlibdoc.who.int/publications/2007/9789241547178 eng.pdf?ua=1 (accessed May 2014).

World Health Organization. 2014. The top 10 causes of death. Available at http://www.who.int/mediacentre/factsheets/fs310/en/ (accessed January 2015).


https://peerj.com

http://dx.doi.org/10.1016/j.jacc.2009.07.020

http://dx.doi.org/10.1111/j.1748-0361.2010.00313.x

http://dx.doi.org/10.1016/S0014-2565(09)70877-X

http://dx.doi.org/10.1001/archinte.168.5.451

http://dx.doi.org/10.2337/dc07-2072

http://dx.doi.org/10.1186/1471-2288-10-112

http://dx.doi.org/10.1038/jhh.2014.25

http://dx.doi.org/10.2337/diacare.23.1.45

http://dx.doi.org/10.1111/ijcp.12563

http://dx.doi.org/10.1002/sim.1742

http://whqlibdoc.who.int/publications/2007/9789241547178_eng.pdf?ua=1

























































































































Submitted 24 July 2015Accepted 16 January 2016Published 15 February 2016

Corresponding authorAntonio Palazón-Bru,[email protected]

Academic editorMandeep Mehra

Additional Information andDeclarations can be found onpage 16

DOI 10.7717/peerj.1673

Copyright2016 Palazón-Bru et al.

Distributed underCreative Commons CC-BY 4.0

OPEN ACCESS

A method to construct a points system topredict cardiovascular disease consideringrepeated measures of risk factorsAntonio Palazón-Bru1, Julio Antonio Carbayo-Herencia2, Maria Isabel Vigo3 andVicente Francisco Gil-Guillén1

1Department of Clinical Medicine, Miguel Hernández University, San Juan de Alicante, Alicante, Spain2Chair of Cardiovascular Risk, San Antonio Catholic University, Murcia, Murcia, Spain3Department of Applied Mathematics, University of Alicante, San Vicente del Raspeig, Alicante, Spain

ABSTRACTCurrent predictive models for cardiovascular disease based on points systems use thebaseline situation of the risk factors as independent variables. These models do nottake into account the variability of the risk factors over time. Predictive models forother types of disease also exist that do consider the temporal variability of a singlebiologicalmarker in addition to the baseline variables.However, due to their complexitythese other models are not used in daily clinical practice. Bearing in mind the clinicalrelevance of these issues and that cardiovascular diseases are the leading cause of deathworldwide we show the properties and viability of a new methodological alternativefor constructing cardiovascular risk scores to make predictions of cardiovasculardisease with repeated measures of the risk factors and retaining the simplicity of thepoints systems so often used in clinical practice (construction, statistical validation bysimulation and explanation of potential utilization). We have also applied the systemclinically upon a set of simulated data solely to help readers understand the procedureconstructed.

Subjects Cardiology, Epidemiology, Public Health, StatisticsKeywords Cardiovascular diseases, Cardiovascular models, Risk factors, Cohort studies

INTRODUCTIONGiven that cardiovascular diseases (CVD) are one of the main causes of death in theworld (World Health Organization, 2014), prediction models are interesting in orderto determine those risk factors that can be acted on to reduce the probability of CVD(Molinero, 2003). The simplest model to make predictions about a dichotomous event,such as CVD, is logistic regression (Hosmer & Lemeshow, 2000). This model producesan equation which, once the values for the various risk factors are known, can be used toevaluate the likelihood of the appearance of disease. However, this sort of model fails toconsider exposure time. This is precisely what is done in survival models, which analysethe time of occurrence of a particular event. Although the best known of these modelsis Cox (Hosmer, Lemeshow & May 2008), it is not the only alternative available. Thereexist other possible methods to analyse survival, called parametric models as they assumea concrete type of distribution, such as the Weibull model, used in the SCORE project

How to cite this article Palazón-Bru et al. (2016), A method to construct a points system to predict cardiovascular disease consideringrepeated measures of risk factors. PeerJ 4:e1673; DOI 10.7717/peerj.1673

https://peerj.com








(Conroy et al., 2003). Indeed, the Framingham study used both logistic regression modelsand survival models (parametric and non-parametric) (National Heart, Lung, and BloodInstitute, 2015).

In conjunction with the Framingham and SCORE predictive models, others havebeen developed that are also used in clinical practice, though to a lesser extent, such asthe Reynolds risk score and the WHO/ISH score (Cooney, Dudina & Graham, 2009).Common to all these is the making of predictions about CVD over a 10-year period,though they consider different outcomes (morbidity and mortality with coronary heartdisorders, mortality from coronary heart disorders, cardiovascular morbidity and mor-tality, or just cardiovascular mortality) and use different mathematical models (Cox andWeibull). These models enable physicians to make long-term decisions for their patients.In addition, the clinical practice guidelines recommend using these predictive modelsto stratify the cardiovascular risk of patients. For example, in Europe, the EuropeanGuidelines on cardiovascular disease prevention in clinical practice indicate ‘‘A riskestimation system such as SCORE can assist in making logical management decisions,and may help to avoid both under-and overtreatment’’ (Perk et al., 2012). In otherwords, clinicians follow the guidelines to improve the decision-making process in orderto prevent CVD, and it is these very guidelines that indicate the use of these predictivemodels. Accordingly, these models are very relevant in daily clinical practice.

Given the complexity of these mathematical models an algorithm is used to enable theclinician to understand them more easily, though precision is lost in the estimation of theprobability of CVD (Sullivan, Massaro & D’Agostino, 2004). To do this, the mathematicalmodels have been transformed into coloured risk tables that can be used systematicallyin clinical practice. However, these tables are based on models that manage clinicalvariables in the baseline situation of the patient (Conroy et al., 2003; National Heart, Lung,and Blood Institute, 2015), and do not therefore take into account the variability of thevariables over time, as the biological parameters are being considered constant over thefollow-up period when in fact they vary greatly and the physician can intervene usingdrugs to either reduce or increase their value (National Cholesterol Education Program,2002; American Diabetes Association, 2014; James et al., 2014; Stone et al., 2014).

Predictive models for survival in other diseases do consider the temporal variabilityof a single biological marker (as well as the baseline variables). These are known as JointModels for Longitudinal and Time-to-Event Data and comprise two parts: (1) A mixedlinear model to determine the path of a longitudinal parameter; and (2) A survival modelrelating the baseline variables and the longitudinal parameter with the appearance of anevent. These models can be used to make more precise predictions about the developmentof a disease (Rizopoulos, 2012). However, due to their complexity they are not used ingeneral clinical practice. In addition, joint modelling when the survival part is formed bya linear function with multiple longitudinal parameters (usual modelling in traditionalsurvival analysis in the health sciences) has only been examined theoretically and currentlyremains a complete computational challenge. This has resulted in the development ofalgorithms to make predictions, as in the univariate case (Rizopoulos, 2011).

Palazón-Bru et al. (2016), PeerJ, DOI 10.7717/peerj.1673 2/19

https://peerj.com


Here we aim to show the viability and properties of a newmethodological alternative forconstructing cardiovascular risk scores (construction, statistical validation by simulationand potential utilization with the new theoretical model) dealing with the temporalvariability of CVD risk factors. We also apply the model using a set of simulated data,with the sole purpose of helping readers understand how to apply it to a real data set withrepeated measures of cardiovascular risk factors. In other words, the example given usingsimulated data is only to show how to apply the method proposed with a real data sethaving the characteristics given in this work. Thus, the scoring system given here has novalue in clinical practice; what is of value is the way the system is constructed.

MATERIALS AND METHODSThe basicmodels used to develop the newmethodwere theCoxmodel with time-dependentvariables, points system in the Framingham Heart Study, Joint Models for Longitudinaland Time-to-Event Data, and predictions of the longitudinal biomarkers using theseJoint Models.

Cox model with time-dependent variablesLet T be a non-negative random variable denoting the observed failure time, which is theminimum value of the true event time T ∗ and the censoring time C (non-informativeright censoring). In other words, T =min(T ∗,C). In addition, we define δ as the eventindicator, which takes the value 1 if T ∗ ≤C and 0 otherwise. On the other hand, let Wbe the vector of baseline covariates and Y (t ) the vector of time-dependent covariates,assuming a defined value for t ≥ 0. With these data, the Cox model with time-dependentvariables takes the following form (risk function):

h(t |w,y (t )

)= h0(t )exp

{γTw+αT y (t )

},

where h0(t ) is the baseline risk function, and γ and α are the vectors of the regressioncoefficients for the baseline and time-dependent covariates, respectively (Andersen & Gill,1982).

The estimation of the model parameters is based on the partial likelihood function(Andersen & Gill, 1982). On the other hand, we have to corroborate whether the functionalform of the covariates in the model is linear. This should be performed using graphicalmethods (Martingale residuals against the covariate of interest). Finally, we have to assesswhether the model fits the data well, through the analysis of the Cox-Snell residuals(graphical test).

The classical Cox regression model (with no time-varying covariates), deletes α and y(t )from the above expression. Furthermore, the model has to verify the following condition(proportional hazard assumption):

log(h(t |w)h0(t )

)= γTw.


https://peerj.com


Points system in the Framingham Heart StudyWe summarize the steps of the method developed by the FraminghamHeart Study to adapta Cox regression model with p covariates to risk charts (Sullivan, Massaro & D’Agostino,2004):(1) Estimate the parameters of the model: γ .(2) Organize the risk factors into categories and determine reference values:

(a) Continuous risk factor (e.g., age): set up contiguous classes and determine referencevalues for each. Example for age: 18–30 [24], 30–39 [34.5], 40–49 [44.5], 50–59[54.5], 60–69 [64.5] and ≥70 years [74.5]. In brackets is the reference value.The Framingham Heart Study researchers recommend mid-points as acceptablereference values, and for the first and last class the mean between the extreme valueand 1st (first class) or 99th percentiles (last class).

(b) Binary risk factors (e.g., gender, 0 for female and 1 for male): the reference value isagain either 0 or 1.Let Wij denote the reference value for the category j and the risk factor i, wherei= 1,...,p and j = 1,...,ci (total number of categories for the risk factor i).

(3) Determine the referent risk factor profile: the base category will have 0 points in thescoring system and it will be denoted as WiREF , i= 1,...p.

(4) Determine how far each category is from the base category in regression units: calculateγi ·(Wij−WiREF

), i= 1,...,p and j = 1,...,ci.

(5) Set the fixed multiplier or constant B: the number of regression units equivalent to 1point in the points system. The Framingham Heart Study generally uses the increasein risk associated with a 5-year increase in age.

(6) Determine the number of points for each of the categories of each risk factor: the closestinteger number to γi ·

(Wij−WiREF

)/B.

(7) Determine risks associatedwith point totals: 1−S0(t )exp

{∑pi=1(γi·WiREF )+B·Points−

∑pi=1 γi·

ˆiw},

where S0(t ) is calculated through the Kaplan–Meier estimator.

Joint models for longitudinal and time-to-event dataUsing the former notation, we have the random variables vector {T ,W ,Y (T )}, whereY (T ) is only a time-dependent variable (longitudinal outcome) which has its valuesdefined intermittently for t . In other words, for a subject (i= 1,...,n), y(t ) is only definedfor tij

(j = 1,...,ni

), yi(tij), where 0≤ ti1≤ ti2≤ ...≤ tini . Now, we will denote as m(t ) the

true and unobserved value of the longitudinal outcome at time t (mi(t ) for the subject i).To assess the effect ofm(t ) on the event risk, a standard option is to adjust a Cox regressionmodel with one time-dependent covariate:

h(t |M (t ),w)= h0(t ∗)exp

{γTw+αm(t )

},

where M (t ) for a subject i is defined as Mi(t )= {mi(u);0≤ u< t }, which denotes thehistory of the true unobserved longitudinal process up to time t . The other parametersin the expression follow the structure of the Cox regression model with time-dependentvariables (see former section). The baseline risk function can be unspecified or can beapproximated with splines or step functions (Rizopoulos, 2012).


https://peerj.com


In the above expression, we have usedm(t ) as the true unobserved longitudinal process.However, in our sample we have y(t ); therefore, we will estimate m(t ) using y(t ) througha linear mixed effects model to describe the subject-specific longitudinal evolutions:yi(t )=mi(t )+εi(t )mi(t )= xTi (t )β+z

Ti (t )bi

bi∼N (0,D)εi(t )∼N

(0,σ 2) ,

where β and bi denote the vectors of regression coefficients for the unknown fixed-effectsparameters and the random effects respectively, x i(t ) and z i(t ) denote row vectors of thedesign matrices for the fixed and random effects respectively, and εi(t ) is the error termwith variance σ 2. Finally, bi follows a normal distribution with mean 0 and covariancematrix D, and independent of εi(t ) (Rizopoulos, 2012).

The estimation of the parameters of the joint models is based on a maximum likelihoodapproach thatmaximizes the log-likelihood function corresponding to the joint distributionof the time-to-event and longitudinal outcomes (Rizopoulos, 2012).

Regarding the assumptions of the model, we have to assess them for both submodels(longitudinal and survival) using the residual plots. For the longitudinal part, we willplot the subject-specific residuals versus the corresponding fitted values, the Q–Q plotof the subject-specific residuals, and the marginal residuals versus the fitted values. Onthe other hand, for the survival part, we will plot the subject-specific fitted values forthe longitudinal outcome versus the martingale residuals, and finally we will determinegraphically whether the Cox-Snell residuals is a censored sample from a unit exponentialdistribution (Rizopoulos, 2012). Regarding the last component (random effects part) of thejoint model for which we have indicated an assumption, other authors have showed thatlinear mixed-effects models are relatively robust to misspecification of this distribution(Verbeke & Lesaffre, 1997).

Predictions of the longitudinal biomarkers using these joint modelsfor longitudinal and time-to-event dataLet

{ti,δi,w i,yi

(tij),0≤ t ij ≤ ti, j = 1,...,ni

}, i= 1,...n be a random sample of the random

variables vector {T ,1,W ,Y }, using the former notation. A jointmodel has been fitted usingthis sample. Now, we are interested in predicting the expected value of the longitudinaloutcome at time u> t for a new subject i who has a history up to the time t of the observedlongitudinal marker Yi(t )=

{yi(s);0≤ s< t

}:

ωi(u|t )= EY{yi(u)|t ∗i > t ,Yi(t ),w i;θ

}where θ denotes the parameters’ vector of the joint model (Rizopoulos, 2011).

Rizopoulos developed a Monte Carlo approach to perform this task, based on Bayesianformulation. He obtained the following simulation scheme (Rizopoulos, 2011):

Step 1: Draw θ(l)∼N(θ, ˆvar

(θ)).

Step 2 : Draw b(l)i ∼{bi|t ∗i > t ,Yi(t ),w i;θ

(l)}.Step 3: Compute ω(l)

i (u|t )= xTi (u)β(l)+zTi (u)b

(l)i .


https://peerj.com


This scheme should be repeated L times. The estimation of the parameter is the mean(or median) of the calculated values (ω(l)

i (u|t ),l = 1,...L) and the confidence interval isformed by the percentiles (95%: 2.5% and 97.5% percentiles) (Rizopoulos, 2011).

We highlight that these predictions have a dynamic nature; that is, as time progressesadditional information is recorded for the patient, so the predictions can be updated usingthis new information.

ConstructionWe wish to determine the probability of having CVD with effect from a baseline situation(t = 0) up to a fixed point in time (t ), given a series of risk factors measured at baselineand during this follow-up. To do this requires the following steps:(1) Adjust a Cox regression model with time-dependent variables. As we are unable to

estimate a joint model with multiple longitudinal parameters (Rizopoulos, 2012), weuse the classic extended Coxmodel (with no shared structure), which requires knowingthe values of all the longitudinal parameters at any value of t . As this is not knownbecause the parameters are recorded intermittently, we take the last value in time as areference.

(2) Use the procedure of the Framingham study to adapt the coefficients of the modelobtained to a points system and determine the probabilities of CVD for each score upto the moment t . We then use these probabilities to construct risk groups that are easyfor the clinician to understand (for example, in multiples of 5%) (Sullivan, Massaro &D’Agostino, 2004).

(3) Adjust a joint model for longitudinal and time-to-event data for each longitudinalparameter recordedduring the follow-up. Thiswill also include all the baseline variables.These models are constructed to make predictions about the longitudinal parametersin new patients (statistical validation by simulation and potential utilization).

Statistical validation by simulationOnce the points system has been constructed, we wish to see whether the model determinesthe onset of CVD accurately in a different set of subjects (validation sample). In thisvalidation sample we know the longitudinal markers up to the point t = 0 (record ofcardiovascular risk factors in the clinical history which were measured before the baselinesituation (t < 0)) and the value of the variables at baseline. With this information wedetermine the probability each subject has of experiencing an event, and we then comparethis with what actually occurred; i.e., determine whether the model is valid. To determinethis validity we follow these steps:(1) Determine L simulations of the longitudinal parameters at the time point t using the

models mentioned in step (3) of construction, from the history (t < 0) and the baselinevariables (t = 0) (Rizopoulos, 2011). We will use these simulated values to constructa distribution of the points for each sample subject. Thus, each subject will have Lvalues for the points variable (evaluating the points system using the simulated valuesand the baseline variables is sufficient), and for each lth simulation each patient willhave a points score. In other words, each simulation will have a distribution of thepoints variable.


https://peerj.com


(2) For each lth simulation adjust a classic Cox model (without time-dependent variables),using just the score obtained as the only explanatory variable. Determine the Harrell’sconcordance statistic for each of these Lmodels. These values will give us a distributionof values for this statistic, with which we calculate the mean (or the median) and the2.5% and 97.5% percentiles (Rizopoulos, 2011). This way we construct a confidenceinterval for this statistic, which will indicate the discriminating capacity of the pointssystem to determine which patients will develop CVD.

(3) Calculate the median of the points distribution for each patient in the validationsample. Note that we do not use the mean as it could contain decimals and this has nosense when applying the scoring system. Using these medians, classify each patient in arisk group and compare the rate of events predicted by the points system in each groupto the actual observed rate. The test used for this process will be Pearson χ2 test.The concordance statistic used has been reported to have various limitations (Lloyd-Jones,

2010). For example, it does not compare whether the estimated and observed risks aresimilar in the subjects. Accordingly, we have added the analysis of the differences betweenthe expected events and the observed events, which minimises this particular problem.In addition, it is very sensitive to large hazard ratio values (≥9). Nonetheless, we have toconsider that as all the variables are quantitative (not categorized), the hazard ratio valuesdo not surpass this threshold. Accordingly, the joint analysis of the concordance index ofHarrell and the differences between the expected and the observed events enables us tovalidate statistically by simulation of the proposed model.

Explanation of potential utilizationOnce the points system has been validated statistically the clinician can then apply thesystem to determine the cardiovascular risk in a new patient, and take any necessarymeasures to reduce this risk. The healthcare professional will already have historicalinformation about the longitudinal parameters (t < 0) and information about the baselinesituation (t = 0) of the new patient. The steps to be followed by the clinician are:(1) Determine the value of each longitudinal parameter at the time t . To do this we

apply the models obtained in step (3) of construction to the history and the baselinesituation of the new patient, in order to determine L simulations for each longitudinalparameter, similar to what was done in the validation process. For each lth simulationwe determine the score corresponding to the profile of cardiovascular risk factorsobtained (simulated and baseline information values). This will give us a pointsdistribution for the new patient.

(2) Determine the median and the 2.5% and 97.5% percentiles of the points vectorconstructed above. The median will be the estimation of the score for the new patientand the percentiles will define the confidence interval (Rizopoulos, 2011). As each scorehas an associated risk, the healthcare professional will be able to know the probabilityof CVD at time t , together with its confidence interval. Finally, the clinician will knowthe values of the biological parameters at t of the median of the points system. This waythe clinician will be able to see which of these parameters has a score above normal;i.e., see the possible areas of intervention to reduce the cardiovascular risk.


https://peerj.com


(3) The clinician now knows the cardiovascular risk and which parameters have a scoreabove normal, so he or she can then design the best intervention for that patient. Thispresents a problem, as we need to know the value of each biological parameter attime t ; i.e., the clinician knows an approximation based on simulations constructedfrom the patient history but does not know how the interventions will affect thecardiovascular risk.From the previous step the clinician knows the parameters on which to act and the

history of these parameters as well as the baseline situation. From these measurements theclinician can establish a realistic objective for the next patient visit at time ˜t (0< ˜t < t ). Theclinician now inserts the desired value of the biological parameter at ˜t and determines itsvalue at time t ; i.e., determine L simulations for each cardiovascular risk factor using theprevious models (step 3 of construction), adding a new value to the history (˜t ).

These calculations will give the benefit of the intervention (estimation (mean or median)of the biological parameter at t ) and the clinician will be able to see from the points systemhow the patient’s risk will be reduced.

Simulation on a data setWith the sole purpose of explaining how to use the method proposed here, we havesimulated a data set upon which to apply each of the steps described above. Note thatwe are in fact going to simulate two data sets, one to construct the model and the otherto validate it statistically via simulation. So that both sets are biologically plausible wehave used estimations obtained in the Puras-GEVA cardiovascular study, which has beenpublished in Medicine (Artigao-Ródenas et al., 2015).

Our data sets will include the following biological parameters: age (years), systolic bloodpressure (SBP) (mmHg), HbA1c (%), atherogenic index, gender (male or female) andsmoking (yes or no). Of these, the SBP, HbA1c and the atherogenic index will be presentat baseline (t = 0) and in the follow-up for the construction sample (t > 0) or recorded inthe clinical history for the statistical validation sample via simulation (t < 0). The choiceto include these variables was based on the current cardiovascular risk scales (Conroy etal., 2003; National Heart, Lung, and Blood Institute, 2015), except for HbA1c, which is usedinstead of a diagnosis of diabetes mellitus in order to include another time-dependentparameter in the final model, in addition to which this way enables us to value the controlof the diabetes mellitus (HbA1c < 6.5%) when preventing CVD.

For the main variable (time-to-CVD) we shall suppose that our cohort is used to predictCVD with a follow-up of 2 years. Note that the traditional cardiovascular risk scales use atime of 10 years (Conroy et al., 2003; National Heart, Lung, and Blood Institute, 2015). Wehave used this lower value because we are going to make predictions for the longitudinalparameters with effect from the baseline situation (t = 0) up to the prediction time and if wetake a prediction value of 10 years the predictions for the longitudinal parameters will varygreatly and not allow us to make precise predictions about which patient will develop CVD,which would negate the usefulness of the method proposed here. Nevertheless, the fact thatthe predictions for the longitudinal parameters have a dynamic character (see Predictions ofthe longitudinal biomarkers using these joint models for longitudinal and time-to-event data)


https://peerj.com


enables us to determine the risk at 2 years with greater precision whenever the patientattends the office of the healthcare professional. Note that the method proposed here hasbeen developed for a theoretical time period t but it can be applied for any time period.Nonetheless, generally speaking the longitudinal parameters would vary more over longertime periods, though this clearly depends on the nature of the data, both at the individuallevel and the population level (Rizopoulos, 2012).

The work used for our simulated data set developed and validated a predictive modelof CVD (angina of any kind, myocardial infarction, stroke, peripheral arterial disease ofthe lower limbs, or death from CVD), to enable calculation of risk in the short, mediumand long term (the risk associated with each score was calculated every 2 years up to amaximum of 14) in the general population (Artigao-Ródenas et al., 2015). Table 4 of thisscoring system shows the importance of this question. For example, a patient with a scoreof 9 points has a probability of CVD at 2 years of 0.67%, whereas at 10 years this risesto 5.16% (Artigao-Ródenas et al., 2015: Table 4). If we regularly calculate the 2-year riskof CVD for our patient and the score remains the same then no new therapeutic actionwill be taken (risk < 1%), whereas if we only calculate the risk once every 10 years we willtake aggressive therapeutic measures when the patient first attends the office, as the scorewill correspond to a cut point defined as high in the SCORE project (5%→ one in 20patients) (Conroy et al., 2003). We see, then, that a regular short-term prediction couldlead to a change in the therapeutic decisions regarding prevention of CVD, provided ofcourse that the possibility exists of calculating the risk regularly. As the risk table given inthe Puras-GEVA study includes predictions for 4, 6, 8, 10, 12 and 14 years, we selected thelowest cut point because if we had to make predictions for a longer time the dispersioncould have increased (Rizopoulos, 2012). This is why we chose this cut point of 2 years forthe simulation.

The longitudinal follow-up measurements (construction sample) assumed that thepatient attends the physician’s office once every 3 months for measurements of SBP,HbA1c and the atherogenic index. This is done until the end of the follow-up for eachpatient. The statistical validation sample using simulation supposes that there is a certainprobability of having records in the clinical history of all the longitudinal parameters every3 months for 5 years retrospectively (t < 0). The probability is different for each of the visitsand will depend on each patient. In other words, we will have intermittent measurementsof all these parameters from t =−5 years to t = 0.

The Supplemental Information (S1) details all the mathematical formulae used toconstruct our data sets, always based on the Puras-GEVA study (Artigao-Ródenas et al.,2015). The simulation was done using R 2.13.2 and IBMS SPSS Statistics 19.

One could think that by managing a shorter time period of just 2 years there would beno variability in the cardiovascular risk factors. However, in S1 we can see that the modelsused show a temporal variability in the risk factors. If there were no variability in thefactors, the models would contain the constant with a very small random error. In otherwords using this prediction time makes sense.

We decided to use a simulated data set as we did not have available any data set with realdata. This way of explaining a new method has already been used by others working with


https://peerj.com

http://dx.doi.org/10.7717/peerj.1673/supp-1



Table 1 Parameters (β s) of the multivariate Cox regressionmodel. Goodness-of-fit (likelihood ratiotest): χ 2

= 912.3, p< 0.001.

Variable β p-value

Age (baseline) (per 1 year) 0.0846 <0.001SBP (per 1 mmHg) 0.00874 <0.001HbA1c (per 1%) 0.188 <0.001Atherogenic index (per 1 unit) 0.191 <0.001Male gender 0.479 0.001Smoker (baseline) 0.721 <0.001

Notes.Abbreviations: SBP, Systolic blood pressure; HbA1c, glycated haemoglobin.

joint models, as the only objective of the simulated data set is to explain how to apply thenew method (Faucett & Thomas, 1996; Henderson, Diggle & Dobson, 2000; Wang & Taylor,2001; Brown, Ibrahim & DeGruttola, 2005; Zeng & Cai, 2005; Vonesh, Greene & Schluchter,2006; Rizopoulos & Ghosh, 2011).

RESULTSGiven the amount and extension of the results these are given in detail in the SupplementalInformation (S2 and S3). However, we have provided here the main results of our example.As before, the analysis was done with R 2.13.2 and IBM SPSS Statistics 19.

Construction of the modelThe parameters of the Cox model with time-dependent variables are shown in Table 1, andits adaptation to the points system with a prediction time of 2 years is reflected in Fig. 1.Table 2 shows the joint models for the longitudinal parameters. To avoid computationalcost simplemodels were used: (1) linear equation for the survival part with all the predictorsincluded (age, gender, smoking, and longitudinal marker) and (2) mixed linear model withpolynomial degree 1 at (1,t ), in both the fixed and the random parts. The baseline riskfunction was defined piecewise.

Statistical validation by simulationThe C-statistic was very satisfactory: 0.844 (95% CI [0.842–0.846]). Comparison betweenexpected and observed events in all the risk groups showed no significant differences(Fig. 2).

Explanation of potential utilizationA new patient arrives at our office with the following characteristics: male, 83 years old,non-smoker, and taking pharmacological medication (one antihypertensive drug and oneoral antidiabetic agent) and non-pharmacological measures (diet and exercise). His historyof cardiovascular risk factors is available (Table 3).

Application of the newmodel gives a histogram of the cardiovascular risk score obtainedfor this patient (Fig. 3). This chart shows a high cardiovascular risk, as most of thesimulations have around 16 points. The estimation of the score was 16 (95% CI [15–17]).


https://peerj.com




Figure 1 Scoring system to predict cardiovascular diseases within 2 years. Abbreviations: SBP, systolicblood pressure; HbA1c, glycated haemoglobin; TC, total cholesterol; HDL-c, HDL cholesterol.

Themedian score corresponded to a SBP of 160mmHg, HbA1c of 5.0% and an atherogenicindex of 6.76. Bearing in mind that the model contains factors upon which it is not possibleto act (gender and age) that give the patient a minimum of 13 points, we should considerstrategies to help the patient not to score in the other categories on the scale (Fig. 1).

The clinician can now see that if the patient complies with a series of interventions(pharmacological (add two antihypertensive drug → −20 mmHg; prescribe a statin→−40% atherogenic index) and non-pharmacological (reduce salt in the diet→−5mmHg)), his longitudinal parameters after 3 months would be: SBP 120 mmHg (145 –2 × 10 – 5 = 120 mmHg), atherogenic index 3.10 (5.17 – 40% = 3.10), and HbA1c 4.9% (same value because no intervention was done). Applying the model using the newinformation gives the cardiovascular risk at 2 years (Fig. 4). The estimation of the scoreis 15 (95% CI [14–15]) and the values that provide a median score are: SBP 124 mmHg,


https://peerj.com


Table 2 Parameters of the joint models with the longitudinal parameters studied. The strategy to eliminate variables is to eliminate down fromthe most complex terms to the most simple terms. Goodness-of-fit: (1) SBP: χ 2

= 371,574.1, p< 0.001; (2) HbA1c: χ 2= 210,881.1, p< 0.001; (3)

Atherogenic index: χ 2= 121,118.0, p< 0.001.

Variable SBP (mmHg) p-value HbA1c (%) p-value Atherogenic index p-value

Event processMale gender 0.428 <0.001 0.475 <0.001 0.446 <0.001Age (per 1 year) 0.0837 <0.001 0.0840 <0.001 0.0833 <0.001Smoker 0.731 <0.001 0.757 <0.001 0.775 <0.001Parameter (per 1 unit) 0.0085 <0.001 0.216 <0.001 0.195 <0.001

Longitudinal process: fixed effects1 133.557 <0.001 6.158 <0.001 4.602 <0.001t 0.0046 <0.001 0.0001 <0.001 0.0001 <0.001

Longitudinal process: random effects1 21.683 N/A 1.346 N/A 1.324 N/At 0.0358 N/A ∗ ∗ 0.0013 N/AResidual 8.933 N/A 0.357 N/A 0.302 N/A

Notes.Abbreviations: SBP, systolic blood pressure; HbA1c, glycated haemoglobin; N/A, not applicable; *, term eliminated due to convergence problems.

Figure 2 Comparison between the proportions (%) of expected and observed events in each of the dif-ferent risk groups.

atherogenic index 4.85, and HbA1c 5.0%. Thus, the risk is reduced, as now the patient has15 points (Fig. 1 and S3).

DISCUSSIONThis paper describes a method to construct predictive models for CVD considering thevariability of cardiovascular risk factors and at the same time having the simplicity of pointssystems, which are widely used in daily clinical practice worldwide (Conroy et al., 2003;Cooney, Dudina & Graham, 2009; National Heart, Lung, and Blood Institute, 2015).

The cardiovascular risk scales currently available do not value the temporal variabilityof the parameters controlling the risk factors, although a very positive aspect of these


https://peerj.com



Table 3 History of the control parameters of the cardiovascular risk factors included in our points sys-tem. Time has a negative value because it refers to the measurements taken before the baseline situationand this was defined as t = 0.

Time (days) SBP (mmHg) HbA1c (%) Atherogenic index

−360 152 5.1 3.56−330 135 5.3 3.23−270 164 4.7 3.45−180 153 4.4 4.12−90 170 5.0 4.150 145 4.9 5.17

Notes.Abbreviations: SBP, systolic blood pressure; HbA1c, glycated haemoglobin.

Figure 3 Cardiovascular risk of a theoretical new patient (pre-intervention).

scales is that they take into account simplicity for immediate application by healthcareprofessionals, the persons who really have to apply these mathematical models (Conroyet al., 2003; Cooney, Dudina & Graham, 2009; National Heart, Lung, and Blood Institute,2015). The joint models currently used do take into account variability over time of a singlelongitudinal parameter (Rizopoulos, 2011), but their interpretation is not as easy as a pointssystem and they cannot be used with various longitudinal parameters, a key question in


https://peerj.com


Figure 4 Cardiovascular risk of a theoretical new patient (post-intervention).

the multifactorial aetiology of CVD. We have attempted to fuse all these techniques intoone single algorithm, retaining the virtues of each (relative risks model, scoring systems,dynamic predictions...).

Comparison between our proposed model and current cardiovascular risk scales isproblematic. Our model is more suitable to make short-term predictions, though themore time that passes from the baseline situation (t = 0) when making a prediction, thevariability of the predictions of the longitudinal parameters increases (Rizopoulos, 2012).This same situation can be found in other areas, such as the economy (stock exchange)or meteorology (weather forecast), though it obviously depends on the nature of thedata being used, both at the individual level and the population level. This however doesnot weaken our model, since because the predictions for the longitudinal parameters aredynamic (Rizopoulos, 2011); any time that we update the clinical information about ourpatient the risk is immediately recalculated. This can be seen in the proposed example (S2and Figs. 3 and 4), where when we introduce new values for the longitudinal parametersthese are updated and a new score for the patient is calculated. In other words, the proposedmethod could be used to calculate the patient’s risk every time the patient attends the office,whereas the traditional risk scales can be used with a longer time interval, as the prognosis isfor 10 years. Thus, the two types of model could be used to assess the risk, for both the shortterm and the long term. Although discrepancies exist between short-term and long-term


https://peerj.com



predictions of CVD (Quispe et al., 2015), the regular use of short-term predictions, bearingin mind the variability of the risk factors, can complement the long-term cardiovascularmodels. In other words, our intention is for clinical practice to use the short-term modelregularly in those patients who attend their physician’s office frequently and use thelong-term model in those who only attend occasionally.

Obtaining simulations from longitudinal parameters is not easy and implies acomputational cost of about one minute with the statistical package R to implement atotal of 100 using a normal computer. On the other hand, the historical values of thelongitudinal parameters are recorded in the clinical history, which nowadays is usuallyelectronic (Palazón-Bru et al., 2014). Given this situation, all the information needed toapply our models is already computerised, so the algorithms implemented in the statisticalpackage R can be adapted to the underlying language of the database containing thevalues of the risk factors. Thus, all the calculations will be immediate for the healthcareprofessional. In other words, just pressing a key will be enough to bring up on the screenin a very short time the histogram shown in Fig. 2 and S2, the theoretical points systemand the set of values of the risk factors determining the median score. In addition, whenthe physician decides to intervene he or she will indicate the duration of the interventionand the possible values for the new patient. After introducing this new information thetwo histograms could be shown together (Figs. 2 and 3, and S2), which will enable thephysician to see the benefit of the intervention.

As this algorithm was developed from a set of simulated data, we encourage otherswho have cardiovascular databases like that used here to implement a model with thecharacteristics described herein. Thus, if using real-life data achieves greater predictiveprecision, we shall be able to apply this method to obtain the best short-term prognosisand thus take the most appropriate decisions for the benefit of the patient. Nevertheless,we should note that the method proposed is based on the combination of mathematicalmodels already used in medicine; therefore, in theory our model is quite correct as we havebeen extremely strict in each of the steps to follow. In practice we can determine the valueof t and the complexity of the models in order to apply the method proposed. Finally,and importantly, the algorithm developed in this study can be used for other diseases orknowledge areas like the economy.

CONCLUSIONSWe developed an algorithm to construct cardiovascular risk scales based on a points systemthat also takes into account the variability of the risk factors. These issues are important asthe popularity of points systems in clinical practice and the improved predictive accuracyusing all the information recorded in the clinical history will improve the currently usedprocedure. The theoretical construction of our method is based on the combination ofmathematical models already used in medicine, taking into account the characteristics ofeach of these other models. As mentioned, the prediction time and the structure of each ofthe models can change in practice, as well as being used for other diseases apart from CVDor even applied to other areas of knowledge. Finally, as we do not have real data available


https://peerj.com




for its immediate application in clinical practice, we encourage others to use our methodswith their own data sets. In the case of CVD, traditional cohort studies should be done,but recording repeated measurements of risk factors both during the follow-up as well asfor the period immediately prior to baseline

ACKNOWLEDGEMENTSThe authors thank Ian Johnstone for help with the English language version of the text.

ADDITIONAL INFORMATION AND DECLARATIONS

FundingThe authors received no funding for this work.

Competing InterestsAntonio Palazón-Bru serves as an academic editor for PeerJ.

Author Contributions• Antonio Palazón-Bru conceived and designed the experiments, performed theexperiments, analyzed the data, contributed reagents/materials/analysis tools, wrotethe paper, prepared figures and/or tables, reviewed drafts of the paper.• Julio A. Carbayo-Herencia and Vicente F. Gil-Guillén conceived and designed theexperiments, contributed reagents/materials/analysis tools, reviewed drafts of the paper.• Maria Isabel Vigo conceived and designed the experiments, reviewed drafts of the paper.

Data AvailabilityThe following information was supplied regarding data availability:

Our data set is simulated and we have explained how to obtain it in the SupplementalInformation.

Supplemental InformationSupplemental information for this article can be found online at http://dx.doi.org/10.7717/peerj.1673#supplemental-information.

REFERENCESAmerican Diabetes Association. 2014. Standards of medical care in diabetes-2014.

Diabetes Care 37(Suppl 1):S14–S80 DOI 10.2337/dc14-S014.Andersen PK, Gill RD. 1982. Cox’s regression model for counting processes: a large

sample study. Annals of Statistics 10:1100–1120 DOI 10.1214/aos/1176345976.Artigao-Ródenas LM, Carbayo-Herencia JA, Palazón-Bru A, Divisón-Garrote JA,

Sanchis-Domènech C, Vigo-Aguiar I, Gil-Guillén VF, on behalf of GEVA.2015. Construction and validation of a 14-year cardiovascular risk score foruse in the general population: the PURAS-GEVA chart.Medicine 94:e1980DOI 10.1097/MD.0000000000001980.


https://peerj.com

http://dx.doi.org/10.7717/peerj.1673/supplemental-information

http://dx.doi.org/10.7717/peerj.1673/supplemental-information



http://dx.doi.org/10.2337/dc14-S014

http://dx.doi.org/10.1214/aos/1176345976

http://dx.doi.org/10.1097/MD.0000000000001980

http://dx.doi.org/10.1097/MD.0000000000001980


Brown ER, Ibrahim JG, DeGruttola V. 2005. A flexible B-spline model for multiplelongitudinal biomarkers and survival. Biometrics 61:64–73DOI 10.1111/j.0006-341X.2005.030929.x.

Conroy RM, Pyörälä K, Fitzgerald AP, Sans S, Menotti A, De Backer G, De BacquerD, Ducimetière P, Jousilahti P, Keil U, Njølstad I, Oganov RG, Thomsen T,Tunstall-Pedoe H, Tverdal A,Wedel H,Whincup P,Wilhelmsen L, Graham IM,SCORE project group. 2003. Estimation of ten-year risk of fatal cardiovasculardisease in Europe: the SCORE project. European Heart Journal 24:987–1003DOI 10.1016/S0195-668X(03)00114-3.

CooneyMT, Dudina AL, Graham IM. 2009. Value and limitations of existing scores forthe assessment of cardiovascular risk: a review for clinicians. Journal of the AmericanCollege of Cardiology 54:1209–1227 DOI 10.1016/j.jacc.2009.07.020.

Henderson R, Diggle P, Dobson A. 2000. Joint modelling of longitudinal measurementsand event time data. Biostatistics 1:465–480 DOI 10.1093/biostatistics/1.4.465.

Hosmer DW, Lemeshow S. 2000. Applied logistic regression. New York: Wiley.Hosmer DW, Lemeshow S, May S. 2008. Applied survival analysis: regression modeling of

time-to-event data. New York: Wiley.James PA, Oparil S, Carter BL, CushmanWC, Dennison-Himmelfarb C, Handler J,

Lackland DT, LeFevre ML, MacKenzie TD, Ogedegbe O, Smith Jr SC, Svetkey LP,Taler SJ, Townsend RR,Wright Jr JT, Narva AS, Ortiz E. 2014. 2014 evidence-basedguideline for the management of high blood pressure in adults: report from thepanel members appointed to the Eighth Joint National Committee (JNC 8). JAMA311:507–520 DOI 10.1001/jama.2013.284427.

Molinero LM. 2003.Modelos de riesgo cardiovascular. Estudio de Framingham. ProyectoSCORE. Available at http://www.seh-lelha.org/pdf/modelries.pdf (accessed July2015).

National Cholesterol Education Program (NCEP) Expert Panel on Detection, Eval-uation, and Treatment of High Blood Cholesterol in Adults (Adult TreatmentPanel III). 2002. Third report of the national cholesterol education program (NCEP)Expert panel on detection, evaluation, and treatment of high blood cholesterol inadults (adult treatment panel III) final report. Circulation 106:3143–3421.

Faucett CL, Thomas DC. 1996. Simultaneously modelling censored survival data andrepeatedly measured covariates: a Gibbs sampling approach. Statistics in Medicine15:1663–1685DOI 10.1002/(SICI)1097-0258(19960815)15:15<1663::AID-SIM294>3.0.CO;2-1.

National Heart, Lung, and Blood Institute (Boston University). 2015. The Framinghamheart study. Available at http://www.framinghamheartstudy.org/ (accessed July 2015).

Lloyd-Jones DM. 2010. Cardiovascular risk prediction: basic concepts, current status,and future directions. Circulation 121:1768–1777DOI 10.1161/CIRCULATIONAHA.109.849166.

Palazón-Bru A, Gil-Guillén VF, Orozco-Beltrán D, Pallarés-Carratalá V, Valls-Roca F,Sanchís-Domenech C, Martín-Moreno JM, Redón J, Navarro-Pérez J, Fernández-Giménez A, Pérez-Navarro AM, Trillo JL, Usó R, Ruiz E. 2014. Is the physician’s


https://peerj.com

http://dx.doi.org/10.1111/j.0006-341X.2005.030929.x

http://dx.doi.org/10.1016/S0195-668X(03)00114-3

http://dx.doi.org/10.1016/S0195-668X(03)00114-3


http://dx.doi.org/10.1093/biostatistics/1.4.465

http://dx.doi.org/10.1001/jama.2013.284427

http://www.seh-lelha.org/pdf/modelries.pdf

http://dx.doi.org/10.1002/(SICI)1097-0258(19960815)15:15<1663::AID-SIM294>3.0.CO;2-1

http://www.framinghamheartstudy.org/

http://dx.doi.org/10.1161/CIRCULATIONAHA.109.849166


behavior in dyslipidemia diagnosis in accordance with guidelines? Cross-sectionalESCARVAL study. PLoS ONE 9:e91567 DOI 10.1371/journal.pone.0091567.

Perk J, De Backer G, Gohlke H, Graham I, Reiner Z, VerschurenM, Albus C, Benlian P,Boysen G, Cifkova R, Deaton C, Ebrahim S, Fisher M, Germano G, Hobbs R, HoesA, Karadeniz S, Mezzani A, Prescott E, Ryden L, Scherer M, SyvänneM, Scholte opReimerWJ, Vrints C,Wood D, Zamorano JL, Zannad F, European Association forCardiovascular Prevention & Rehabilitation (EACPR), ESC Committee for Prac-tice Guidelines (CPG). 2012. European Guidelines on cardiovascular disease preven-tion in clinical practice (version 2012). The Fifth Joint Task Force of the EuropeanSociety of Cardiology and Other Societies on Cardiovascular Disease Preventionin Clinical Practice (constituted by representatives of nine societies and by invitedexperts). European Heart Journal 33:1635–1701 DOI 10.1093/eurheartj/ehs092.

Quispe R, Bazo-Alvarez JC, Burroughs PeñaMS, Poterico JA, Gilman RH, CheckleyW,Bernabé-Ortiz A, HuffmanMD,Miranda JJ, PERUMIGRANT Study, CRONICASCohort Study Group. 2015. Distribution of short-term and lifetime predictedrisks of cardiovascular diseases in Peruvian adults. Journal of the American HeartAssociation 4:e002112 DOI 10.1161/JAHA.115.002112.

Rizopoulos D. 2011. Dynamic predictions and prospective accuracy in joint models forlongitudinal and time-to-event data. Biometrics 67:819–829DOI 10.1111/j.1541-0420.2010.01546.x.

Rizopoulos D, Ghosh P. 2011. A Bayesian semiparametric multivariate joint modelfor multiple longitudinal outcomes and a time-to-event. Statistics in Medicine30:1366–1380 DOI 10.1002/sim.4205.

Rizopoulos D. 2012. Joint models for longitudinal and time-to-event data with applicationsin R. Boca Raton: CRC Press.

Stone NJ, Robinson JG, Lichtenstein AH, Bairey Merz CN, Blum CB, Eckel RH, Gold-berg AC, Gordon D, Levy D, Lloyd-Jones DM,McBride P, Schwartz JS, Shero ST,Smith Jr SC,Watson K,Wilson PW, American College of Cardiology/AmericanHeart Association Task Force on Practice Guidelines. 2014. 2013 ACC/AHA guide-line on the treatment of blood cholesterol to reduce atherosclerotic cardiovascularrisk in adults: a report of the American College of Cardiology/American HeartAssociation Task Force on Practice Guidelines. Journal of the American College ofCardiology 63(25 Pt B):2889–2934 Erratum in: Journal of the American College ofCardiology 63(25 Pt B): 3024–3025 DOI 10.1016/j.jacc.2013.11.002.

Sullivan LM,Massaro JM, D’Agostino Sr RB. 2004. Presentation of multivariate datafor clinical use: the Framingham study risk score functions. Statistics in Medicine23(10):1631–1660 DOI 10.1002/sim.1742.

Verbeke G, Lesaffre E. 1997. The effect of misspecifying the random effects distributionin linear mixed models for longitudinal data. Computational Statistics and DataAnalysis 23:541–556 DOI 10.1016/S0167-9473(96)00047-3.

Vonesh EF, Greene T, Schluchter MD. 2006. Shared parameter models for the jointanalysis of longitudinal data and event times. Statistics in Medicine 25:143–163DOI 10.1002/sim.2249.


https://peerj.com

http://dx.doi.org/10.1371/journal.pone.0091567

http://dx.doi.org/10.1093/eurheartj/ehs092

http://dx.doi.org/10.1161/JAHA.115.002112

http://dx.doi.org/10.1111/j.1541-0420.2010.01546.x




http://dx.doi.org/10.1016/S0167-9473(96)00047-3




Wang Y, Taylor JMG. 2001. Jointly modeling longitudinal and event time data withapplication to acquired immunodeficiency syndrome. Journal of the AmericanStatistical Association 96:895–905 DOI 10.1198/016214501753208591.

World Health Organization. 2014. The top 10 causes of death. Available at http://www.who.int/mediacentre/ factsheets/ fs310/ en/ (accessed July 2015).

Zeng D, Cai J. 2005. Simultaneous modelling of survival and longitudinal data with anapplication to repeated quality of life measures. Lifetime Data Analysis 11:151–174DOI 10.1007/s10985-004-0381-0.


https://peerj.com

http://dx.doi.org/10.1198/016214501753208591



http://dx.doi.org/10.1007/s10985-004-0381-0

http://dx.doi.org/10.1007/s10985-004-0381-0



106


107

3. CONCLUSIONES.


108


109

• Se aportan nuevas mejorar para los modelos existentes que se basan en

variables explicativas tomadas en la situación basal.

• Las mejoras propuestas permiten calcular el riesgo cardiovascular en

periodos de tiempo más cortos y en elegir la mejor combinación de

variables para tener el pronóstico más acertado.

• El cálculo del riesgo a corto y medio plazo, podría implementarse para

las escalas de riesgo cardiovascular actuales, sin necesidad de

modificar el modelo predictivo.

• El método que analiza todas las combinaciones posibles de variables

explicativas debe de aplicarse sobre la construcción de nuevas escalas

cardiovascular.

• Se ha desarrollado un algoritmo para construir escalas de riesgo

cardiovascular basadas en sistemas de puntos, teniendo en cuenta la

variabilidad temporal de los factores de riesgo.

• La construcción teórica de nuestro método está basada en la

combinación de modelos matemáticos utilizados en la literatura

científica, teniendo en cuenta las características de cada uno de ellos.

Como se ha detallado de forma teórica, el algoritmo propuesto admite

modificaciones tanto en el tiempo de predicción como en la estructura de

cada uno de los modelos de cara a una utilización práctica, lo que

facilita la adaptación del mismo para la predicción del riesgo de otras

enfermedades diferentes de la cardiovascular, así como a otras áreas de

conocimiento.


110

Nuevos modelos predictivos de enfermedad … · Nuevos modelos predictivos de enfermedad...

Documents

Transcript of Nuevos modelos predictivos de enfermedad … · Nuevos modelos predictivos de enfermedad...