IntroductionUnderstanding the regional trends of major diseases is important because health data for a specific region can aid establishment of effective healthcare policies. Large-scale longitudinal studies are the best available tool to extrapolate data from large districts (urban and rural areas) over time and improve the understanding of various diseases in public health. Although a large cohort study may be the best tool to study regional trends, there is a high proportion of patients lost to follow-up and this is an important consideration. In addition, large cohort studies only include those patients who receive treatment at specialist hospitals. In contrast, the National Health Insurance (NHI) claims records combine the advantage of extended follow-up and the absence of selection bias. Claims data can be analyzed to measure the prevalence of diseases, patterns of healthcare use, clinical outcomes, accessibility of health services, duration of treatment, cost of care, and adherence to good practice guidelines [1–4]. Like the NHI claims database, “big data” is defined as large volumes of high velocity, complex, and variable data that require advanced techniques and technologies to enable the capture, storage, distribution, management, and analysis of information.In South Korea, the NHI system includes the health insurance system, which is financed by mandatory contributions, as well as medical aid, a social assistance scheme for the very poor, which is financed by general taxation. Over 95% of all residents in South Korea are covered by the NHI whose claims database includes information about diagnostics, treatments, health service providers, and associated costs, which may be used to study the prevalence of various diseases. However, access to the NHI data of all Korean patients is restricted and only a few studies using this data have been carried out to date [5–15].The aim of the current study was to investigate chronological patterns of diseases in the Northern Gyeonggi-do province of South Korea between 2002 to 2013, and to evaluate the regional differences in disease patterns between the Northern Gyeonggi-do province and South Korean provinces as a whole.
1. Study design and participantsThis study was based on NHI data from the National Screening Program. In South Korea, the NHI provides mandatory universal health insurance to nearly all Koreans, whilst the remaining are covered by a public assistance plan (i.e., Medicaid). The Ministry of Health and Welfare entrusts the handling of claims to the NHI, and so the data of Medicaid enrollees are also managed by the NHI. This study excluded the insured employee group from the study population because both employers and employees may incur penalties if they do not undergo health checkups. Because the NHI provided data without identification codes for the dependents of the insured employee group at the beginning of our study, data from these dependents were not analyzed in this study.This study focused on an area of surveillance consisting of the Northern Gyeonggi-do province of South Korea, including 5 districts, 2 urban (Uijeongbu, Dongducheon) and 3 rural [Pocheon, Yangju, Yeoncheon (Figure 1)]. Using the NHI Cohort Database based on the National Health Information Database (NHID Cohort 2002–2013), a retrospective, population-based study to investigate year-to-year trends of disease patterns between 2002 and 2013 was conducted and differences were evaluated. The NHI provided a cohort of participants who were in health screening programs, called the National Health Insurance Service-Health Screening Cohort. To construct this database, a sample cohort was first selected from the 2002 and 2003 health screening participants, who were aged between 40 and 79 in 2002 and followed up through to 2013.This study searched the statistical data using the 3-character categories of the International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM). ICD-10 codes of the disease used for searching the NHID-Cohort 2002–2013 were listed in Table 1. From an epidemiological and public health perspective, these 12 diseases were selected based on the database of 1 regional center hospital of the Northern Gyeonggi-do province because they were the most common.
2. Data sourcesThis study used the NHI Cohort Database based on the NHID-Cohort 2002–2013 of the Health Insurance Review and Assessment Service (HIRA) for the period from 2002 to 2013. From the NHID for the year 2001, 46,614,378 NHI claims data were extracted and 1,410,287 NHI claims data were also extracted from the medical aid database for the year 2002. Duplicated data, data of foreigners, data associated with erroneous resident registration numbers and data of patients in the 0 quintile of household income were excluded and a total of 46,605,433 NHI claims data were selected for the sampling population (Figure 2). From this population, using proportional quota stratified random sampling, 1,025,340 NHI claims data were sampled to create the NHI cohort database. The NHI claims data were provided by the Korean HIRA, an independent body established to review the claims data and assess the quality of health care in South Korea. As NHI coverage is mandatory, the HIRA-run database contains information concerning all submitted claims and prescriptions for entire beneficiaries of health insurance and medical aid.
3. Statistical analysisThe NHI database includes information on almost the entire population of South Korea, so the assumption was made that sampling errors could be excluded. Accordingly, differences in percentages and rates were calculated without a p value.This study analyzed the prevalence of diseases using the personal health registry data obtained from the NHI and compared the disease prevalence of the Northern Gyeonggido province with the national averages. The prevalence of the diseases among the 5 districts of the Northern Gyeonggi-do province was also compared.Frequencies and percentile distributions were used to describe categorical variables. Continuous variables were presented as mean ± standard deviation.
4. Ethical considerationThe study protocol was approved by the institutional review board (IRB) of Uijeongbu St. Mary’s Hospital, the Catholic University of Korea (IRB No. UC15EISI0003). Informed consent was waived by the IRB.
ResultsThe time trends of 12 diseases in a cohort group from 2002 to 2013 are shown in Figure 3. All 12 diseases were diagnosed with increasing frequency in the Northern Gyeonggi-do province compared with the rest of South Korea. There were also several trends unique to individual diseases.During the 12-year study period, there was a greatly increased incidence in newly-diagnosed cases of uterine cervix cancer, urinary bladder cancer and colon cancer in the Northern Gyeonggi-do province compared with the remaining provinces in South Korea as a whole. However, by 2013, newly-diagnosed cancer cases had dropped markedly, showing similar incidence rates as the rest of the country. Acute myocardial infarction and end-stage renal disease showed variable trends, with a sharp increase in disease prevalence in 2007. Furthermore, as time progressed, the gap between disease prevalence in the Northern Gyeonggi-do province and the rest of South Korea broadened.More gradual increasing trends over time were seen for psychiatric disorders, diabetes mellitus, hypertension and peptic ulcer.For intracranial hemorrhage and bronchitis/bronchiolitis, no obvious differences were found between the rates of disease prevalence in Northern Gyeonggi-do province and the rest of South Korea.In contrast, malaria showed a unique time trend. While in other provinces in South Korea there was no rise in malaria cases, in the Northern Gyeonggi-do province, cases of malaria peaked in 2004, 2007 and 2009 to 2010.
DiscussionA wide variation in health outcomes often exists across different regions of a nation. Rapid growth in healthcare spending and wide regional variations in healthcare expenditure cause the government or healthcare policymakers to consider setting a target healthcare expenditure level for each province [2,16–19]. Therefore, local government and its healthcare policymakers must consider the best strategy to decide which diseases to target when only limited healthcare resources are available. Evaluation of regional differences in the incidence and distribution of disease over time will aid these decisions. This is thought to be the first study that has used “big data” to investigate time trends of diseases nationally, as well as regionally in Northern Gyeonggi-do province using the NHI Cohort Database based on the National Health Information Database (NHID Cohort 2002–2013) during a 12-year period.By definition, the term “big data” in healthcare refers to electronic health datasets so large and complex, that they are difficult (or impossible) to manage with traditional software and/or hardware, nor can they be easily managed with traditional or common data management tools and methods. The unique properties of big data are volume, velocity, variety and veracity . Healthcare institutes have generated large amounts of data, driven by record-keeping, compliance and regulatory requirements, and patient care. Whilst most data in the past was stored in hard copy form, the current trend is towards rapid digitization of these large amounts of data. In South Korea, all healthcare institutes have processed medical fees by electronic data interchange since 1998, and these claims data have been recorded as digitized data in the database of the National Health Insurance Service. For more than 10 years, most healthcare institutes have operated the electronic medical record systems, connecting the electronic data interchange system.It has been reported that big data analytics in healthcare has several advantages in: 1) clinical operations, 2) research and development, 3) public health, 4) evidence-based medicine, 5) genomic analytics, 6) pre-adjudication fraud analysis, 7) device/remote monitoring, and 8) patient profile analytics . The analysis of disease patterns, tracking of disease outbreaks and transmission, can be particularly helpful in the establishment of public healthcare policies, which identify needs, provide services, and predict and prevent crises, especially for the benefit of populations.This study demonstrated that people in the Northern Gyeonggi-do province have unique patterns of the 12 diseases analyzed compared with the other provinces of South Korea. These results can help healthcare providers or policymakers in establishing healthcare policies that are optimized for the population in the Northern Gyeonggi-do province. The main aim of the study was to provide a comprehensive view of complex usage, requirements, and outcome trends of healthcare, at the local or regional level to governments and healthcare providers. This information can help governments and healthcare providers allocate resources proactively and achieve the best efficacy in outcomes. To do this well, the first step was to aggregate the healthcare-related data comprehensively and analyze them at the level of large populations. These data can be used to reduce waste, target healthcare services more directly to the areas of most need, and redirect spending to effective interventions.The emergence of malaria in the Northern Gyeonggi-do province was much higher than the national average. This study showed that malaria rates peaked in 2004, 2007 and 2009 to 2010 in the Northern Gyeonggi-do province and also suggested that public healthcare providers should review the reasons for these increases in malaria infections. Korea Centers for Disease Control and Prevention demonstrated that the spatial distribution of malaria cases during 2001 to 2010 was uneven, with the vast majority of cases being recorded in the northern provinces of South Korea, often very close to the South-North Korean border. Many of the malaria cases detected south of Seoul were attributed to military veterans, < 2 years after separation from the Republic of Korea military. In these cases, malaria developed from exposure near to the demilitarized zone. Approximately 60% of the cases of malaria were due to exposure 6–18 months prior to the onset of symptoms [22–24]. In another study, results indicated that a large percentage of civilians (non-veterans) that were reported to have contracted malaria south of Seoul/Gyeonggi province, were also exposed near to the demilitarized zone. Therefore, some scientists claim that the re-emergence of malaria in South Korea can be attributed to the disease initially being spread from North Korea, and a majority of the recorded cases were from military personnel, who were mainly located close to the South-North Korean border . However, it has recently been reported that the registered number of malaria cases has also increased in the civilian population who live far from the South-North Korean border [24,26]. The results from these studies indicate that the government and healthcare policymakers should focus on prevention and control of malaria in the areas of outbreak. Malaria cases since 2004, are most likely due to environmental factors such as moderate rains that increase Anopheles and other mosquito populations, or intense flooding that wash larvae from breeding sites, or semi-drought conditions that result in drying of larval habitats. This suggests that annual trends between the climatic variables and malaria prevalence should be analyzed and collaboration between health and climate governance is imperative.Generally, there are many diseases that are more prevalent in an aging population. The aging population has increased in the Northern Gyeonggi-do province as well as nationwide. Yeoncheon, Pocheon, and Dongducheon districts have larger aging populations than the national average. However, the increasing rate of the aging population was similar between the Yangju district and the national average. The aging population of the Uijeongbu district was lower than the national average and decreased more in the year 2010 compared with 2005, which is in contrast with other districts and the nationwide trend (Figure 4). Considering these trends, additional factors may be influencing the regional disparity of the disease prevalence in Northern Gyeonggi-do. Therefore, more detailed studies of specific diseases will be needed in the future.A limitation of our study was that it evaluated the regional patterns of prevalence in the Northern Gyeonggi-do province, and did not explain the exact reasons for the differences in the prevalence. This study used the NHI Cohort Database, therefore, it was impossible to evaluate the incidence or prevalence of the disease in the Northern Gyeonggi-do province. Even though it was based on the database of a regional center hospital of the Northern Gyeonggi-do province, this study investigated the 12 most common diseases treated. This may not necessarily be representative of diseases of the Northern Gyeonggi-do province. Further studies will be needed in the future.In conclusion, this study revealed that several diseases of the Northern Gyeonggi-do province showed unique and differentiated trends over time compared to other provinces in South Korea. It also demonstrated that a “big data” study using the NHI Cohort Database based on the NHID Cohort 2002–2013 can provide useful insight into the healthcare environment for healthcare providers, stakeholders, and policymakers.