Estimating Odds Ratios in Logistic Regression of Dichotomous Data<br>

Estimating Odds Ratios in Logistic Regression of Dichotomous Data

Research Article

Open Access

Oyeka ICA¹ and Okeh UM^2*

¹Department of Applied Statistics, Nnamdi Azikiwe University, Awka Nigeria

²Department of Industrial Mathematics and Applied Statistics, Ebonyi State University Abakaliki, Nigeria

^*Corresponding author:

Okeh UM
Department of Industrial Mathematics and Applied Statistics
Ebonyi State University Abakaliki, Nigeria
E-mail: uzomaokey@ymail.com

Received January 03, 2013; Published January 08, 2013

Citation: Oyeka ICA, Okeh UM (2013) Estimating Odds Ratios in Logistic Regression of Dichotomous Data. 2:608 doi:10.4172/scientificreports.608

Copyright: Â© 2013 Oyeka ICA, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Abstract

This paper proposes an odds ratio type measure of strength of association between screening test results and state of nature or condition in a population from diagnostic screening tests based on logistic regression analysis of dichotomous outcomes. The proposed method unlike in the analysis of data from most screening tests requires that the response to the condition of interest is dichotomous, assuming one of two possible values. The predisposing factors in this study are categorical variables. This would enable the fitting of a logistic regression model to help in the estimation of desired probabilities, odds and odds ratios of positive responses. A test statistic to assess the statistical significance of the proposed measure based on the logistic regression is developed. The proposed method is illustrated with some sample data and the results are shown to compare favourably with what is obtained using the usual expression for the odds ratio.

Keywords

Gestational diabetes mellitus; Odds; Odds ratio; Logistic regression; Dichotomous

Introduction

Often a candidate for an examination or a job interview may wish to estimate the probability of his success given some predisposing factors such as the number of hours he studied per day or per week, the nature, type and duration of the examination, the condition; prior qualifications his age ,gender, ethnic group, state of origin etc. A clinician conducting a diagnostic test or drug trials for a certain condition may wish to know the odds that his subjects or patients respond positive given their various characteristics such as age, gender, body weight, family history [1] etc. A gynecologist or a pediatrician may wish to estimate the odds that a new-born baby is under-weight or has more than normal gestation period given the mothers age, parity, body weight and childâ€™s gender [2] etc. In all the situations the response to the condition of interest is dichotomous, assuming one of two possible values. The predisposing factors may be either categorical or continuous variables [3]. This would enable the fitting of a logistic regression model to help in the estimation of desired probabilities, odds and odds ratios of positive responses as discussed below[4-6].

The proposed model

Let y_i be the response of the i^th randomly selected subject to the condition of interest assuming values of either 1(positive response) or 0 (negative response) for i=1,2,â€¦,n. Let x_i1, x_i2,...., x_ik be the score by the i^th subject on the independent explanation, or predetermined variables X₁, X₂,...., X_K respectively,

The following analysis of variance (ANOVA), Table 1 is used to test the adequacy of Equations 4 and 5 based on the F-test statistic:

Table 1: Four-fold table for the screening test results and gold standard of risk pregnant women for GDM Gold Standard.

X is an nÃ—(k+1) matrix of regressors. The null hypothesis to be tested for the adequacy of Equation 4 using the results of Table 1 is

H₀: Î²₁= Î²₂= ......= Î²_k=0 vs H₁(6)

j=1, 2,....,k

H₀ is rejected at the Î± level of significance if

Otherwise H₀ is accepted where F_{1-Î±; k, n-k-1} is the critical value of the F-distribution with k and n-k-1 degrees of freedom for a specified Î± level. If the model fits, that is if H₀ is rejected so that not all the Î²_js are zero, then we may proceed to estimate the required probabilities, odds and odds ratios of positive responses to the condition of interest. Thus assuming that H₀ is rejected then we estimate from Equations 4 and 5 the odds that the i^th subject responds positive to the condition under study given the independent variables X₁=x_i1, X₂=x_i2,.... X_k=x_ik as

Estimation odds ratio from logistic regression of data

Note that since the right hand side of Equation 10 is independent of i, i=1, 2,â€¦,n. Equation 10 may be interpreted as the estimated odds ratio of positive responses by any randomly selected subjects under the specified conditions. In obtaining the odds ratio of Equation 10 it is assumed that some independent variables are increased or decreased by some constant. It is however also possible that some of these independent variables are increased or decreased proportionately, that is by some percentage or proportion of the independent variables themselves. Thus suppose

assuming a value of (1+Î±)x_il and assuming a value of (1-Î¥)x_is, holding other independent variables constant. Then the resulting odds of positive response by the i^th randomly selected subject is,

llustrative example

Table 2 shows the data obtained from a collection of hospitals in Ebonyi State covering from January 2010 to December 2011 particularly from the medical record unit of these hospitals. It was the result of a retrospective study on the effect of four independent risk factors (variables) in the development of gestational diabetes mellitus (GDM). A sample of 301 risk pregnant women who satisfied the inclusion criteria based on WHO, 1999 [7] standard were considered. All the risk factors (family history-FH, obesity, age, and previous fetal weight) considered in this work are dichotomous in which case; it has been coded for use in estimating the odds ratio in logistic regression. The dependent variable is GDM. We here present sample data obtained in a diagnostic screening test to confirm the presence or absence of GDM among the sampled subjects from a certain population. The proposed method is illustrated using the sample data of table 1.

Testing the adequacy of model

Regression analysis showed the following results. Now from Equation 6, where Î²_jâ‰ 0 since we have from analysis that obesity=-0.082, Age=-0.020, Family History(FH)= -0.211, PFW=-0.125 and Constant=3.503.

We here reject H₀ and conclude that the risk factors have significant relationship. Odds ratio values for the risk factors showed Obesity=0.921, Age=0.981, FH=0.810, PFW=0.88 and constant=33.209. Significant values of risk factors are Obesity=0.747, Age=0.981, FH=0.810, PFW=0.882 and constant=33.209. These indicates high significance for their relationship. It also shows the effects of these risk factors on the occurrence of GDM.

Summary and Conclusion

This paper has proposed a statistical method for measuring the probability of success based on some predisposing factors from the association between diagnostic screening test results and state of nature or condition in a population based on the probabilities, odds, odds ratio estimated from logistic regression. A test statistic for the statistical significance of the proposed measure of association are developed based on the estimation of logistic regression of the dichotomous dependent variable and some covariates variables. The proposed method is illustrated with sample data and shown to compare favourably with results that would have been obtained using the traditional expression for the odds ratio test and other statistic for measure of association.

References

Hall GH, Round AP (1994) Logistic regression-explanation and use. J R Coll Physicians Lond 28: 242-246.

Lee J (1986) An insight on the use of multiple logistic regression analysis to estimate association between risk factors and disease occurrence. Int J Epidemiol 15; 22-29.

Fleiss JL, Bruce L, Paik MC (2003) Statistical Methods for Rates and Proportions. (3^rd edn), Wiley, New York.

Fienberg SE (1980) The Analysis of Cross-Classified Categorical Data (2^nd edn). The MIT Press, Cambridge.

Hosmer DW, Lemeshow S (1989) Applied Logistic Regression. Wiley, New York.

Van Houwelingen JC, le Cessie S (1988) Logistic regression: a review. Stat Neerl 42: 215-232.

World Health Organization (1999) Definition, diagnosis and classification of diabetes mellitus and its complications, Part 1. WHO Department of Non communicable disease surveillance, Geneva.