Acta Univ. Agric. Silvic. Mendelianae Brun. 2017, 65(5), 1779-1791 | DOI: 10.11118/actaun201765051779

Economic Aspects of the Missing Data Problem - the Case of the Patient Registry

Hatice Uenal, David Hampel
Department of Statistics and Operation Analysis, Mendel University in Brno, Zemědělská 1, 613 00 Brno, Czech Republic

Registries are indispensable in medical studies and provide the basis for reliable study results for research questions. Depending on the purpose of use, a high quality of data is a prerequisite. However, with increasing registry quality, costs also increase accordingly. Considering these time and cost factors, this work is an attempt to estimate the cost advantages of applying statistical tools to existing registry data, including quality evaluation. Results for quality analysis showed that there are unquestionable savings of millions in study costs by reducing the time horizon and saving on average € 523,126 for every reduced year. Replacing additionally the over 25 % missing data in some variables, data quality was immensely improved. To conclude, our findings showed dearly the importance of data quality and statistical input in avoiding biased conclusions due to incomplete data.

Keywords: Benford law, data source quality, missing-at-random mechanism, missing data problem, reducing study costs

Published: October 31, 2017  Show citation

ACS AIP APA ASA Harvard Chicago IEEE ISO690 MLA NLM Turabian Vancouver
Uenal, H., & Hampel, D. (2017). Economic Aspects of the Missing Data Problem - the Case of the Patient Registry. Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis65(5), 1779-1791. doi: 10.11118/actaun201765051779
Download citation

References

  1. ARCHAMBAULT, J. and ARCHAMBAULT, M. E. 2011. Earnings Management among Firm during the Pre-SEC-Era: A Benford's Law Analysis. The Accounting Historians Journal, 38(2): 145 - 170. DOI: 10.2308/0148-4184.38.2.145 Go to original source...
  2. BANKHOFER, U. and PRAXMEIER, S. 1998. Zur Behandlung fehlender Daten in der Marktforschungspraxis. Marketing: Zeitschrift für Forschung und Praxis, 20(2): 109 - 118. Go to original source...
  3. BANSAL, G. et al. 2008. Tuning Data Mining Methods for Cost-Sensitive Regression: A Study in Loan Charge-Off Forecasting. Journal of Management Information Systems, 25(3): 315 - 336. DOI: 10.2753/MIS0742-1222250309 Go to original source...
  4. BENEISH, M. 1997. Detecting GAAP violation: Implications for assessing earnings management among firms with extreme financial performance. Accounting Public Policy, 16(3): 271 - 309. DOI: 10.1016/S0278-4254(97)00023-9 Go to original source...
  5. BENEISH, M. 1999. The detection of earnings manipulation. Financial Analysts J., 55(5): 24 - 36. DOI: 10.2469/faj.v55.n5.2296 Go to original source...
  6. BENFORD, F. 1938. The Law of Anomalous Numbers. Proceedings of the American Philosophical Society, 78(4): 551 - 572.
  7. BISHOP, Y. M. 2007. Discrete Multivariate Analysis. Theory and Applications. New York: Springer Science and Business Media.
  8. BREDL, S., WINKER, P. and KÖTSCHAU, K. 2008. A statistical approach to detect cheating interviewers. ZEU Discussion Paper Nr. 39. Giessen: ZEU.
  9. CAPPIELLO, et al. 2003. Time-Related Factors of Data Quality in Multichannel Information Systems. Journal of Management Information Systems, 20(3): 71 - 91. DOI: 10.1080/07421222.2003.11045769 Go to original source...
  10. CECCHINI, M. et al. 2010. Detecting Management Fraud in Public Companies. Management Science, 56(7): 1146 - 1160. DOI: 10.1287/mnsc.1100.1174 Go to original source...
  11. CHAO, A. et al. 2001. The applications of capture-recapture models to epidemiological data. Stat Med, 20: 3123 - 3157. DOI: 10.1002/sim.996 Go to original source...
  12. DE VOCHT, F. and KROMHOUT, H. 2012. The Use of Benford's Law for Evaluation of Quality of Occupational Hygiene Data. Ann. Occup. Hyg., 57(3): 296-304.
  13. ENDERS, C., K. 2010. Applied Missing Data Analysis (Methodology in the Social Sciences). 1st Edition. New York, London: Guilford Press.
  14. ESTEVA, M. et al. 2013. Data Mining for "Big Archives" Analysis: a Case Study. Proceedings of the American Society for Information Science and Technology, 50(1): 1 - 10. DOI: 10.1002/meet.14505001076 Go to original source...
  15. FEWSTER, R. M. 2009. Teachers Corner. A Simple Explanation of Benford's Law. The American Statistician, 63(1): 26 - 32. DOI: 10.1198/tast.2009.0005 Go to original source...
  16. GLICKLICH, R. E. et al. 2014. Registries for Evaluating Patient Outcomes: A User's Guide. 3rd Edition. Rockville (MD) Agency for Healthcare Research and Quality (US).
  17. HILL, T. 1995a. Base-Invariance Implies Benford's Law. Proc. Amer. Math. Soc., 123(3): 887-895. Go to original source...
  18. HILL, T. 1995b. A Statistical Derivation of the Significant-Digit Law. Statis. Sci., 10(4): 354-363. DOI: 10.1214/ss/1177009869 Go to original source...
  19. JUDGE, G. and SCHLECHTER, L. 2009. Detecting Problems in Survey Data Using Benford's Law. The Journal of Human Resources, 44(1): 1 - 24. DOI: 10.3368/jhr.44.1.1 Go to original source...
  20. LEE, Y. W. 2003. Crafting Rules: Context-Reflective Data Quality Problem Solving. Journal of Management Information Systems, 20(3): 93 - 113. DOI: 10.1080/07421222.2003.11045770 Go to original source...
  21. LEE, Y. W. and STRONG, D. M. 2003. Knowing-Why about Data Processes and Data Quality. Journal of Management Information Systems, 20(3): 13 - 39. DOI: 10.1080/07421222.2003.11045775 Go to original source...
  22. LEEMIS, L. et al. 2000. Survival Distributions Satisfying Benford's Law. The American Statistician, 54(4): 236 - 241. Go to original source...
  23. LEWIS, W. R. et al. 2008. An organized approach to improvement in guideline adherence for acute myocardial infarction: results with the Get with The Guidelines quality improvement program. Arch Intern Med, 168(16): 1813 - 1819. DOI: 10.1001/archinte.168.16.1813 Go to original source...
  24. MA, M., CHEN, S., ZHU, B., Y., ZHAO, B. W., WANG, H. S., XIANG, J., WU, X. B., LIN, Y. J., ZHOU, Z. W., PENG, J. S. and CHEN, Y. B. 2015. The clinical significance and risk factors of solitary lymph node metastasis in gastric cancer. PLoS One, 10(1): e0114939. DOI: 10.1371/journal.pone.0114939 Go to original source...
  25. MARITZ, S. G. 2003. Data management: managing data as an organisational resource. Acta Comercii, 3(1): 75 - 85. Go to original source...
  26. NAGEL, G. et al. 2012. Potential of register-based studies to investigate rare diseases. Example of the first German population based amyotrophic lateral sclerosis registry. Akt Neurol, 39(1):12 - 17.
  27. NAGEL, G. et al. 2013. Implementation of a population based epidemiological rare disease registry: study protocol of the amyotrophic lateral sclerosis (ALS) registry Swabia. BMC Neurology, 13: 22. DOI: 10.1186/1471-2377-13-22 Go to original source...
  28. ROTHENBACHER, D. et al. 2015. New opportunities of real world data from clinical routine settings in life cycle-management of drugs: example of an integrative approach in multiple sclerosis. Curr Med Res Opin, 11: 1 - 39. Go to original source...
  29. SILVIA, P. J. 2014. Planned Missing Data Designs in Experience Sampling Research: Monte Carlo Simulations of Efficient Designs for Assessing Within-Person Constructs. NIH, 46(1): 41 - 54. Go to original source...
  30. SPENCER, B. D. 1985. Optimal Data Quality. Journal of the American Statistical Association, 80 (391): 564 - 573. DOI: 10.1080/01621459.1985.10478155 Go to original source...
  31. TAM, C. et al. 2007. Breaking the (Benford) Law: Statistical Fraud Detection in Campaign Finance. The American Statistician, 61(3): 218 - 223. DOI: 10.1198/000313007X223496 Go to original source...
  32. UENAL, H. et al. 2014a. Incidence and geographical variation of amyotrophic lateral sclerosis (ALS) in Southern Germany The ALS registry Swabia. PlosOne, 9(4): e93932. DOI: 10.1371/journal.pone.0093932 Go to original source...
  33. UENAL, H. et al. 2014b. Choosing Appropriate Methods for Missing Data in Medical Research: A Decision Algorithm on Methods for Missing Data. JAQM, 9(4): 10 - 21.
  34. WANG, R. Y. 1996. Beyond Accuracy: What Data Quality Means to Data Consumers, Journal of Management Information Systems, 12(4): 5 - 33. DOI: 10.1080/07421222.1996.11518099 Go to original source...
  35. WARD, D., H. 1968. "Counting the Cost" - Statistical Methods and Profitability. Journal of the Royal Statistical Society. Series C (Applied Statistics), 17(3): 274 - 276. Go to original source...
  36. WILLIAMS, B. 2002. Analysis and Management of Animal Populations. Academic Press.
  37. YEWDALL, G. A. 1969. Cost-Effective Operational Research, Sessions 1 and 2, Operational Research Society, 20: 23 - 24. DOI: 10.1057/jors.1969.6 Go to original source...
  38. ZHANG, S. et al. 2007. "Missing is Useful": Missing Values in Cost-sensitive Decision Trees. In: ZHANG, Z. and SIEKMANN, J. (Ed). Knowledge Science, Engineering and Management. Melbourne.

This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY NC ND 4.0), which permits non-comercial use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.