A Probabilistic Cohort-Component Model for Population Forecasting – The Case of Germany

The future development of population size and structure is of importance since planning in many areas of politics and business is conducted based on expectations about the future makeup of the population. Countries with both decreasing mortality and low fertility rates, which is the case for most countries in Europe, urgently need adequate population forecasts to identify future problems regarding social security systems as one determinant of overall macroeconomic development. This contribution proposes a stochastic cohort-component model that uses simulation techniques based on stochastic models for fertility, migration and mortality to forecast the population by age and sex. We specifically focused on quantifying the uncertainty of future development as previous studies have tended to underestimate future risk. The model is applied to forecast the population of Germany until 2045. The results provide detailed insight into the future population structure, disaggregated into both sexes and age groups. Moreover, the uncertainty in the forecast is quantified as prediction intervals for each subgroup.


Introduction
The future development of the population structure is of immense importance since planning in many areas of politics and business is done based on expectations about the future composition of the population. Countries with low fertility and decreasing mortality rates, as is the case for most countries in Europe, particularly need accurate population forecasts since these demographic changes transform the long-term age distribution of the population in favor of older persons. These changes result in widely discussed future problems, e.g., for the social security systems as well as the labor market as a whole. The public discussion about the demographic change in Germany and its challenges is mostly tinged with negative undertones (Deschermeier 2011: 669). Nevertheless, the transformation of a society also represents a very positive aspect: people are getting older while experiencing more healthy and active years of life compared to those in previous generations (Schnabel et al. 2005: 3).
There is a consensus among experts that the population in Germany will shrink and age in the long run (Wilke and Börsch-Supan 2009: 32;Deschermeier 2015: 106;Dudel 2014: 184;Fuchs et al. 2018: 48 -49;Härdle and Myšičková 2009: 26;Lipps and Betz 2005: 32;Pötzsch and Rößger 2015: 15). Deaths have exceeded births in Germany since 1972, so without positive net migration the population would be shrinking (Swiaczny 2016: 158). Between 2009 and 2015, net migration into Germany has increased monotonically, starting at a level of almost -56 thousand in 2008 and reaching a record of more than +1.139 million in 2015 (Bundesministerium des Innern 2017: 186). This trend was first induced by a combination of three factors.
First, the European debt crisis, which hit countries in Southern and Eastern Europe especially hard and led to major immigration from these regions to Germany and to Central and Northern Europe (Brücker et al. 2017a: 3). Second, a large increase in immigration from Afghanistan caused by a worsening security situation due to increased aggressiveness by the Taliban against the American military and civilians (Bundesministerium des Innern 2011: 107;International Organization for Migration 2014: 104). Third, a large spike in migration from Iraq following the start of the resurgence from the United States (U.S.) military, which caused more attacks from Islamist militias (Bundesministerium des Innern 2011: 107;Jaffe 2009). These trends continued in the following years, and the expansion of the European Union (EU) resulted in an increased influx of people from Southeastern Europe to Germany (Bundesministerium des In-nern 2015: 14). Following the so-called Arab Spring, which started in 2011 in Tunisia, Islam-ists have gained massive amounts of power due to the power vacuum appearing after the end of dictatorships in these countries (Council on Foreign Relations 2012). The so-called Islamic State (IS) in 2014 had rapid and surprising military success, especially in Syria and Iraq, where they proclaimed a caliphate (Heidelberg Institute for International Conflict Research 2017: 189). Many people subsequently fled from these regions, leading to record refugee migration into Germany in 2015. This migration was fueled by Chancellor Merkel's decision to simplify the asylum process and to essentially guarantee people from Syria legal refugee status. These changes subsequently motivated many people in places such as Serbia, Albania, Kosovo and Iran to try their chances as refugees as well. Some of these refugees even immigrated illegally using false identification documents to pose as Syrians (Aust et al. 2015;Bewarder and Leubecher 2016;Bundesamt für Migration und Flüchtlinge 2016: 14-50;Zeit Online 2015).
Against this background, this contribution provides a stochastic population forecast of the yearend population in Germany through the year 2040. The population in each year of the forecast is broken down by sex and age for the range 0 to 115 years. We use stochastic modeling approaches developed in past contributions (Vanella 2017;Deschermeier 2018, 2018b) to forecast the demographic components of the population development. These forecasts are used to estimate the growth in the age-and sex-specific population, starting from the estimated population on December 31, 2016. In this way, we generate 10,000 sample paths for the future population by simulating a probabilistic cohort-component model by Monte Carlo simulation of Wiener processes of the demographic components.
Stochastic approaches are gaining popularity as an alternative to the common deterministic population projections that use scenarios to address future uncertainty (Keilman et al. 2002: 410). Planners and decision makers need to know which future path is most likely to occur.
Stochastic forecasts based on simulations are less prone to subjective decision-making, since the results show a wide range of possible scenarios and quantify them probabilistically. As a result, the risk of personal misjudgment by the modelers is reduced. Our model returns not only the median age-and sex-specific population up to the year 2040 but also quantifies the uncertainty in the forecast, illustrated with 75% and 90% prediction intervals (PIs) for each year, age and sex.
The next section presents a condensed historical overview of the evolution of the cohort-component method for population updating and past advances in population projection, starting with the first deterministic models and continuing with improvements to these models through probabilistic forecasting. Our study primarily focuses on Germany; therefore, our overview gives special emphasis to population projections for Germany. In Section 3, we describe the population forecast process in detail by explaining how the demographic components fertility, migration and mortality are forecast and how these individual forecasts are combined into an overall population forecast for Germany via a probabilistic cohort-component model. Section 4 presents and discusses the results, and Section 5 provides an outlook and discusses the limitations of the presented approach.

Selected Population Forecasts and Projections with Special Emphasis on Germany
Future population projections are often conducted by deterministic cohort-component models.
To the best of the authors' knowledge, this method dates to 1863, when the Census Bureau of England and Wales (1863) ran a projection of the population in England and Wales for the year 1881 by 20-year age groups. Births, deaths and migrations were identified as the components of demographic development. The population was projected by making assumptions about changes in birth rates, mortality rates and net migration for each age group or cohort. Cannan trends in age-specific fertility derived from recent census data. He projected the population in England and Wales until the year 1951. Whereas Cannan's approach implicitly modeled international migration in combination with deaths, Whelpton (1928: 255-270) incorporated expectations of migration in a forecast of the U.S. population by age group, sex and ethnicity until the year 1975, setting the stage for modern cohort-component modeling.
Deterministic methods quantify a limited number of scenarios whose likelihoods of occurrence are not quantified by probability. Therefore, stochastic methods are recommended for population forecasting (Alho and Spencer 2005: 2-3;Bomsdorf et al. 2008: 125;Keilman et al. 2002: 410-412;Lee 1998: 157-170;Lutz and Scherbov 1998: 83). Ledermann and Breas (1959: 637-681) proposed the transformation of age-specific mortality rates (ASMRs) into indices through singular value decomposition, which was developed geometrically by Pearson at the beginning of the 20 th century (1901: 559-563). They were thus the first to use principal component analysis (PCA) to reduce the high dimensionality in demographic processes. Le Bras and Tapinos (1979: 1405-1449 Bozik and Bell (1987) proposed a groundwork for stochastic modeling by applying autoregressive integrated moving average (ARIMA) models to forecast age-specific fertility rates (ASFRs) in the United States. Bell and Monsell (1991: 156-157) applied this method to forecasting agespecific mortality rates (ASMRs). Lee and Carter simplified the Bozik-Bell and Bell-Monsell approaches to forecast age-specific mortality (Lee and Carter 1992: 660-668) and fertility rates (Lee 1993: 190-199) in the U.S. Since then, various modifications of the Lee-Carter model have been proposed (see, e.g., Booth 2006: 554-562;Booth et al. 2006: 290-304 for an extensive overview), maybe most notably the functional PC approach of Hyndman and Ullah (2007: 4945-4952).
Many population projections and forecasts 1 have been made for Germany during the past halfcentury; the best known is the "koordinierte Bevölkerungsvorausberechnung" from the German Federal Statistical Office (Destatis). The first version was published in 1966. Since then, twelve updates have been made with improved techniques. The basic principle involves making a set of assumptions about the long-term development of life expectancy, total fertility rate (TFR) and net migration (currently two alternatives for each) to derive age-specific statistics. Probabilistic population forecasts for Germany are rare. To the best of the authors' knowledge, the first approach was undertaken by Lutz and Scherbov (1998: 83-91). Their idea was to pool a large number of earlier deterministic projections and to approximate the distributions of the parameters by assuming Gaussian distributions. Lutz and Scherbov investigated nine population projections for Germany and derived distributions for the TFR, life expectancy and net migration. On the basis of these summary statistics and assumptions about the distributions of the age-specific rates, they calculated empirical quantiles for the population size via scenariobased simulation to obtain projection intervals through 2050. This method is very attractive when a sufficient statistical basis for inference is lacking but appears rather subjective since it is built upon the scientists' assessment of the future course of the demographic components.
Subjective judgment generally has a high potential for error since it is not necessarily connected to statistical data. Furthermore, individuals experience difficulties in translating their qualitative judgment about realistic future scenarios into quantitative probabilities (Lee 1998: 168-170). Lipps and Betz (2005: 11-38) produced separate forecasts for the population in West and East Germany for the period 2002-2050, assuming convergence of the mortality and fertility rates in the East towards the levels in the West. They simulated 500 trajectories for a mortality index, the TFR and net migration. The age-specific mortality rates were derived through the classic Lee-Carter index, and the TFR was assumed to follow a random walk process 2 . Age-specific fertility rates (ASFRs) were deduced from the TFR with a variable Gaussian ASFR distribution.
The net migration was modeled as an autoregressive process of order one (AR (1)). Age-specific migration was then calculated via a distributional assumption. The simulation of the time series processes produced 500 trajectories with PIs of the age-and sex-specific populations of West and East Germany.
This contribution was a major improvement on previous approaches. A general limitation of models using a fixed age schedule for the ASFR, as assumed by Lipps and Betz, is that they ignore the tempo effect in fertility, which describes the postponement of child-bearing into later points in life (e.g. Vanella and Deschermeier 2019). They assume that the mother's mean age at birth will converge to 31.45 years in the long run. This approach is quite restrictive and, at least from today's perspective, not realistic at 31.45 years 3 . Quantification of the PIs for this statistic seems problematic since the variance in the forecast is apparently constant and has the same value for 2002 and 2050. Uncertainty about the far future is probably greater than that for the near future (Box et al. 2016: 129 -147). Bomsdorf et al. (2008: 125-128) used ARIMA models to forecast the TFR and the net migration in Germany. They used these summary measures to derive ASFRs and age-specific migration via age schedules, namely, a Beta distribution for the ASFRs. Age-and sex-specific measures for mortality and net migration were obtained from the Lee-Carter model, and 5,000 simulations of the time series models produced empirical PIs. Härdle and Myšičková (2009: 4-26) applied the Lee-Carter models for mortality and fertility to estimate these two components for Germany. Furthermore, they forecast immigration to and emigration from Germany with separate AR(1) models to estimate the population in Germany until the year 2057. Dudel (2014: 95-216) non-parametrically forecast the population of West and East Germany until 2060 using historical simulation techniques based on 1,000 trajectories. His method, although statistically interesting, has a few caveats. First, the mortality model assumes a perfect correlation between the two genders, which statistically is unlikely (see e.g., Vanella 2017: 543-552). Although different developments in mortality are evident for both sexes, mostly arising from different smoking (Pampel 2005: 461-463;Trovato and Lalu 1996: 31-35;Waldron 1993: 458-460) and nutritional (Luy and Di Giulio 2006: 1-8;World Health Organization 2015) behaviors, the main trends in mortality reduction result from advances in medicine and better education among the population with regard to health and hygiene. Females and males both benefit from these improvements (Pötzsch and Rößger 2015: 34). Second, Dudel rejects trajectories for the TFR under 1 and over 3, censoring the total density. A pre-specified transformation would have mitigated this problem from the very beginning. Third, the overall migration model can be criticized because it assumes a fixed age schedule (which is unlikely) and PIs whose width remains almost constant over time instead of increasing, which has been pointed out as a limitation for earlier studies as well. Deschermeier (2015Deschermeier ( , 2016 forecasts the total population of Germany until 2035. He uses a model designed by Hyndman and Ullah (2007) to forecast the ASFRs and applies an advanced version of Hyndman et al. (2013) to forecast ASMRs and net migration. Although the model appears promising, it also underestimates the uncertainty in the forecast. Hyndman's approach smooths the data against outliers, which may be reasonable in some cases to obtain better estimates for the mean prediction. The problem with this method is that this smoothing ignores the probability of future outliers and therefore effectively underestimates the future uncertainty by simply stating that already observed outliers cannot appear again in the future.  Nevertheless, the authors appear to underestimate the uncertainty as well, as the PIs of the TFR and net migration remain essentially constant after 2020. Considering the high stochasticity of international migration (see Vanella and Deschermeier 2018: 273 -276), proposing some assumptions on the future course is understandable, especially regarding the long-term convergence of net migration toward a certain level. A decrease in net migration by approximately 750,000 from 2015 to 2016, as assumed in the mean, has never been observed for Germany post World War II. Furthermore, the assumption that net migration will increase significantly again after such a heavy decrease appears questionable.
A general problem of many studies is the probable underestimation of the future risk in the population forecasts. Some models quantify risk by qualitative judgment, which is very difficult to translate into mathematical numbers as shown earlier. On the other hand, the presented quantitative studies mostly use the Lee-Carter model for forecasting, which is mostly sufficient for the mean but naturally leads to underestimation of future risk, as this model only considers a small amount of the PCs. The risk explained by the other PCs is thus ignored in the analysis, leading to a systematic underestimation of the future uncertainty. Many models do not quantify the uncertainty in migration at all, which is especially problematic, as international migration is the most uncertain of all demographic components. The overview of the relevant literature shows that approaches for population forecasting for the case of Germany that model all three demographic components by age and sex stochastically do not yet exist, with the exception of Azose et al. (2016). Our contribution is to propose an approach that is not only fully probabilistic but also considers the autocorrelations and cross-correlations of the demographic rates.

Method and Data
In this section, we propose a population forecast based on a probabilistic cohort-component model. The partial models for the demographic components shall be explained shortly. 4 First, the age-, sex-, and nationality-specific net migration figures are forecast as in Vanella and Deschermeier (2018). The data used are synthetic net migration figures per years of age (0-105), sex (binary) and nationality group, which are estimated by the authors using two data sets provided by Destatis for that study. The nationalities are split into seven groups: Germans, EUor Schengen-citizens excluding Germany, Third-Country Europeans, Africans, Asians, Citizens from the Americas or Oceania ("Overseas"), and finally persons with no clear information on their citizenship, either because it is unknown or they have none ("NA"). The synthetic 5 data used for that study are estimated through two datasets provided by Destatis; the first includes age-specific migration data by sex, divided by Germans and non-Germans (Destatis 2015a(Destatis , 2016a(Destatis , 2017a(Destatis , 2018a, and the second dataset is disaggregated by nationality and five age groups (Destatis 2017b(Destatis , 2018b. 6 Vanella and Deschermeier (2018: 266 -267) derived the synthetic dataset used for the analysis from these two provided datasets. The base time period is 1990-2016. We run a principal component analysis (PCA) on the derived 1,484 age-, sex-, and nationality-specific net migration (ASNSNM) figures. The first two principal components (PCs) were identified as some kind of labor market index and an index for crises (Vanella and Deschermeier 2018: 267 -270). The loadings of the PCs are for both sexes and the different nationality groups given in Figure 1 and Figure 2. 4 The original sources serve as a more detailed description of the models and their results. 5 Our dataset does not exist as such but is rather estimated from different sources used by Vanella and Deschermeier (201) in their study. Therefore, we call it a synthetic dataset. 6 The exact method for deriving the synthetic data is outlined in Vanella and Deschermeier (2018: 264-271). Vanella and Deschermeier (2018: 268) identified the first PC as an index of labor migration due to the high positive loadings on European and Asian net migration alongside high negative loadings on Germans in the working age group.

Source: Own calculation and design
The loadings of the second PC are non-positive, thus addressing the overall net migration level.
In combination with the historical course, Vanella and Deschermeier (2018: 269 -270) argue that the absolute value of the PC is especially large in times of significant crises, therefore addressing it as a Crises Index.
The historical course together with the forecast of these two variables through 2040 is plotted in Figure 3.

Source: Own calculation and design
The Labor Market Index has an increasing long-term trend on average and includes cyclical effects, which are typical for labor markets. The Crises Index is assumed to converge towards its mean during the base period due to a lack of better knowledge. The models are fit via ordinary least squares regression in the first step. The resulting noise is estimated with ARIMA models, which are then used for future simulation to consider the uncertainty in the forecast.
The remaining 1,482 PCs are assumed to be random walk processes and are simulated accordingly. The resulting 10,000 trajectories of the future course of the PCs are transformed back into forecasts of the ASNSNMs through 2040, which are then finally aggregated by sex and age for the net migration forecast. The results of the forecasts are presented in Section 4 among other simulation outcomes.
The model for mortality is based on Vanella (2017), where the Destatis data on deaths by sex and age (Destatis 2016b(Destatis , 2017c(Destatis , 2018c as well as the end-of-year population for the years 1952-2016 (Destatis 2015b(Destatis , 2015c(Destatis , 2015d(Destatis , 2016c(Destatis , 2017d(Destatis , 2018d) are used to estimate age-and sex-specific mortality rates (ASSMRs) for 0-94 year-olds. This procedure has the advantage of deriving adjusted ASSMRs, which include changes in the population due to international migration, directly from our mortality measure. The timing of migration is covered by the ASS-MRs, assuming that the future timing is similar to the timing observed in the past. The ASSMRs for ages 95 and over are estimated by non-linear least squares fitting of logistic models until age 115, following Thatcher et al. (1998). Age-and sex-specific survival rates (ASSSRs) result from subtracting the corresponding ASSMRs from 1. A PCA is performed for the ASSSRs.

Source: Own calculation and design
The development of the Lee-Carter Index shows a general trend of decreasing mortality over all age groups. The expected increase in the Behavioral Index reflects convergence in nutritional and smoking behaviors between males and females.
Regarding fertility, we use data on age-specific births among individuals aged 15 to 49 for the years 1968 to 2016 provided by Destatis directly or downloaded from GENESIS-Online (GEN-ESIS- Online Datenbank 2018d;Destatis 2007Destatis , 2014aDestatis , 2014b together with the age-specific data on the female population of reproductive age. Specific birth data on younger as suggested by Vanella and Deschermeier (2019). We derive age-specific fertility rates (AS-FRs) by dividing age-specific births by the corresponding mean age-specific female population for the respective year. As proposed by Vanella and Deschermeier (2019), we run a PCA on the ASFRs for mothers aged 13-54 years for the base period 1968-2016. This time horizon was proposed in that paper because it shows fertility developments after the second wave of the feminist movement (Hertrampf 2008).

Source: Own calculation and design
PC is associated with the general quantum of fertility and is to some extent influenced by family policy (Vanella and Deschermeier 2019). Figure 6 illustrates the loadings of these two PCs. Figure 7 shows the historical courses of these two variables with the forecast until the year 2040.

Source: Own calculation and design
The forecast of the "Policy Index", which addresses the quantum of fertility, is conditional on the assumption that real financial transfers in family policy are kept constant on the level of enous. Making assumptions about future investments is not the goal of this paper. We can then derive the results from a status quo scenario.
The gender of the children will be simulated after computing the birth numbers. Therefore, we calculate the ratio of males among all live births annually based on the sex-specific birth numbers in Germany from 1950 to 2016 extracted from GENESIS-Online (GENESIS-Online Datenbank 2018a). We then fit a logistic ARIMA model to the data for simulation of the birth ratio until 2040. The ratio's historical course alongside the median forecast and 75% PIs is given in Figure 8.

Sources: GENESIS-Online Datenbank 2018a;, Own calculation and design)
An apparent trend of a decreasing ratio of male births is evident over the analyzed horizon.
This trend can also be observed in other industrialized countries since at least the 1970s (Davis et al. 2007: 941-943;James 2000James : 1179James -1182. Although various studies individually report some evidence that environmental factors such as weather (Helle et al. 2008), exposure to toxins (see, e.g., Davis et al. 2007: 941-942), and nutritional behavior (Mathews et al. 2008(Mathews et al. : 1662(Mathews et al. -1666 have some influence on a baby's sex, none of the findings explain the observed trends of decreasing ratios of male births. Considering the clear basic trend since 1950, assuming that the trend will continue over the forecast horizon is plausible.
All described models are based on principal component time series models and thus include autocorrelations in the time series alongside cross-correlations among the age-and sex-specific demographic rates and numbers. the population aged x years at the end of year y for sex g in trajectory t. The population update is performed through the following step-wise process.
Step I: The forecast begins with an adjustment of the base population with regards to international migration flows in the first forecast year y+1. The addition of international net migration aged x+1 years of sex g during year y+1 and in trajectory t ( +1, +1, , ) to , , , leads to the hypothetical subpopulation ̃+ 1, +1, , at the end of year y+1 without any deaths: ̃+ 1, +1, , = , , , + +1, +1, , .
In contrast to many applications, we use the ASSSRs of the contemporary year since our mortality model simulates the ASSSRs based on the population at the end of the current year, not the one before.
Step IV: The live births +1, are estimated: where , +1, denotes the ASFR for females aged x years in year y+1 in trajectory t.
In this way, the population by sex and age in year y+1 in trajectory t is obtained. This process is then used to stochastically forecast the population by sex and age until the year 2040. The algorithm is illustrated in Figure 9.

Population Development in Germany until 2040
The combination of the resulting trajectories for the demographic components as explained in Section 3 results in a probabilistic cohort-component model for forecasting the age-and sexspecific population for the ages 0-115 years. The initial population for the forecast is the ageand sex-specific population reported by Destatis for December 31, 2016 (Destatis 2018d). Population numbers for ages 100 years and older are not available in detail but are instead aggregated into an upper age group. Therefore, we estimated the population in this age group through geometric extrapolation until age 115.
In Section 3, we described the partial models for forecasting of the demographic components.
Now, we provide a selection of the results from the forecast. The overview is kept short since ity model results in 10,000 trajectories for all ASFRs. By multiplication of the ASFRs with the corresponding female population, the birth forecast is completed. The results are given in Figure 10.

Sources: GENESIS-Online Datenbank 2018a; Own calculation and design
The increasing trend in births, as witnessed since 2012, is expected to continue until 2020.
Birth numbers will probably subsequently decrease moderately because most children are born by mothers over 29 years of age, as shown by Vanella and Deschermeier (2019). This decrease can therefore be explained by the decreasing number of births at the beginning of the 1990s, as shown at the left-hand side of the graph. The median increase during the second half of the 2030s stems from a slightly increasing TFR 8 together with almost stagnating birth numbers during the cohorts 2005 to 2011, which by then will be in their reproductive phase.

Sources: GENESIS-Online Datenbank 2018b; Own calculation and design
Similarly, the death numbers are derived from the ASSSRs and the population update. As shown in Figure 9, deaths can be derived by simulating the hypothetical age-and sex-specific population at the end of some period in some trajectory without deaths and then multiplying this number with the respective adjusted ASSMR to derive the actual number of deaths among this group. The resulting death numbers are illustrated in Figure 11.
This results from the strong birth cohorts of the 1960s and early 1970s, who until then are all over 60 years of age, about the age group, where mortality risks start increasing strongly. The numbers are expected to decrease a bit until the late 2030s, after which on average another increase will occur. At that point, many of the immigrants coming from Germany since reunification will be in their 60s and older, therefore witnessing higher mortality risks themselves.
By subtracting the death numbers from the birth numbers, we calculate the natural population growth, which forecast can be derived indirectly from the birth and death forecasts as well.

Sources: GENESIS-Online Datenbank 2018a, 2018b; Own calculation and design
The results of the natural population growth forecast is illustrated in Figure 12. A slight negative tendency is probable. At high likelihood, the deaths will exceed the births over the forecast 29 horizon. The simulation study gives a probability of just 15.9% in 2040 for the birth numbers to exceed the death numbers.
Counterbalancing the shrinking population due to natural population decrease is the international net migration. The forecast method for the ASNSNM numbers has been explained in Section 3, the results of the simulation are cumulated into the total net migration for illustration purposes in Figure 13. 9

Sources: GENESIS-Online Datenbank 2018e; Own calculation and design
The median scenario gives a slightly decreasing net migration, whereas some cyclic course due to economic cycle is probable. In general, the high uncertainty in migration forecasting is obvious, but in general a positive net migration is very likely. The median of net migration in 2040 is 255,334 persons. This is a higher balance then most previous projections provide, that were calculated before the record influx of 2015. As many bigger cities of origin of the refugees, especially in Syria, are mostly devastated by war (McKenzie 2018;Pleitgen 2017), it seems unlikely that there will be a mass emigration out of Germany in the years to come, as one might expect due to experience from past refugee crises. Furthermore the results reflect the strong past development of the economy in Germany. This trend is probable to remain stable in the future (OECD 2017: 130-133). The attractive labor market is likely to attract more people in the future (Fuchs et al. 2018: 49-54), especially within the EU due to the unrestricted free movement of workers (Vanella and Deschermeier 2018: 274-277). Total net migration in 2040 is estimated above zero at 76.77% probability.
The high importance of positive net migration, especially in the younger ages, shall be mentioned to fill the shortages occurring in the labor market due to overaging. We stress that the effect of migration on the labor market and the social security system very much depends on the skill level and education of the immigrants. Especially in cases of refugee migration, where education is often either relatively low or not accepted by German standards, it usually takes a long time for the immigrants to fully integrate into the labor market (Brücker et al. 2017b). Figure 14 shows the forecast of total population until the year 2040 with 75% and 90% PIs.

Sources: GENESIS-Online Datenbank 2018f; Destatis 2016d; Own calculation and design
In contrast to many earlier studies on Germany (see Section 2), the population is expected to increase moderately over the forecast horizon due to high, yet decreasing, net migration, an increasing TFR and decreasing mortality. Contrary to common belief and based on our findings, there is no empirical evidence for a decrease in population size through the chosen forecast horizon. Although realistic, the likelihood of the decrease is relatively small. Table 1 shows the forecast and projection results for Germany for a selection of the studies mentioned in Section 2. The percentiles were chosen to allow comparison of our results to those of other studies without bias due to different quantiles. Studies not mentioned here, like Lipps and Betz (2005) or Deschermeier (2016), did not provide the corresponding percentiles or had a shorter forecast horizon. Our median forecast of the total population is substantially above those of the other studies.
Most of the presented studies were conducted before the refugee crisis since 2014 and the above-average net migration since 2010 caused by the European debt crisis. These developments mark significant changes in migration. At the time most of these studies were conducted, such developments could not have been foreseen. Our investigation suggests that the earlier studies appear to underestimate the uncertainty in the forecast, with 90% PIs of 6-15 million persons. According to our forecast, the ceteris paribus population in 2040 will be between 78.713 and 94.829 million people at a 90% probability level, with a median outcome of 86.647 million.
In many cases (like in social security), the structure of the population is of higher importance than its size per se. Therefore, Figure 15 gives an overview of the age structure of the population in 2016 compared to the forecast in 2040 with PIs for both sexes.

Sources: Destatis 2018d; Own calculation and design
We observe the 25-year shift in the population. In general, there is greater uncertainty for males.
Whereas the retirement-age population can be predicted relatively well, the uncertainty in the future working-age population is rather large for males due to the higher uncertainty in the migration forecast for males relative to females (Vanella and Deschermeier 2018: 274-277). The uncertainty in the population of persons under 25 years of age mostly arises from the fact that this portion of the population has not yet been born but also from the relatively high uncertainty in international migration.
to the overall age structure, the median age of the male and female populations is considered as a summary indicator for the future age distribution of the population. The median age of the population can be obtained from the simulation results because it is the exact age that cuts the population in half. This computation for all 10,000 trajectories can be used to extract PIs for the median age, similar to the computation of median life span conducted by Vanella (2017: 548-552). The results of our analysis are shown for both sexes in Figure 16. We observe a rejuvenation effect for the upcoming years due to high net migration during this period, as illustrated in Figure 13, and to increasing birth numbers, as shown in Figure 10. The high net migration around the year 2015 combined with the high forecast values for the upcoming years leave a mark in the age structure of Germany. This can be seen in the age structure for the male and female populations in the year 2040. By that time, the majority of the population that immigrated during the high influx phase will be approximately 50 years old, while the baby boomer generation will be in their seventh decade of life. Over the forecast horizon, the median age traces this development by a rejuvenation effect for men and women. The probable decrease in the number of births after the early 2020s and decreasing net migration and mortality (Vanella 2017: 550) lead to an aging of the population structure, as represented by the increasing median ages after that point. Since a larger portion of migrants is male (Vanella and Deschermeier 2018: 274-276), the rejuvenation is stronger for males than for females.
As we have shown by some important measures, our model provides a wide range of detailed analyses targeting specific topics of interest. The forecast results offer the possibility for a wide range of future studies, e.g., analyzing the effects of population changes on social security, the labor market or housing demand.

Conclusions, Limitations and Outlook
This paper proposed a probabilistic cohort-component approach for population forecasting by sex and age. It was applied to predict the population of Germany until the year 2040. Germany witnessed a record migration influx in 2015 due to the refugee movement, especially from Syria, Iraq and Afghanistan, in combination with the challenging economic situation in many countries in Southern and Eastern Europe. The record net migration marks a considerable event for Germany's demographic development. The expected long-term decrease in the population does not appear to hold based on our findings. The results provide essential data on the consequences of the current trends for decision makers, planners and scientists.
The model predicts the population by age and sex of Germany until the year 2040. The forecast is conducted as a composite of three time series models based on PCA for the three demographic components fertility, international migration and mortality by sex and age. The fertility model is conditional on political intervention as well, considering reforms in family policy to some extent. The method is specified for Germany, but it can be applied to other countries or regional units, for which sufficiently long time series data for the demographic components are available. Stochastic modeling of the population produced point estimates of the future population in addition to a measure of the future uncertainty via prediction intervals. The results may be disaggregated or aggregated almost arbitrarily regarding sex, age and level of uncertainty.
The model is well-suited for regular updating and does not require large amounts of data input since it is restricted to demographic variables and uses official statistics provided by Destatis.
One interesting result is the detailed reporting and probabilistic quantification of the disaggregated population for all ages and both sexes; therefore, the results offer many possibilities for future forecast studies that require disaggregated population data as inputs, e.g., research on social security, life insurance or housing demand.
Our method is restricted to quantitative methods; therefore, past unobserved trends are not considered in the future. Nevertheless, for all demographic variables, the input data span at least as long of a time horizon as is forecast; thus, we believe that all realistic trends that might be observed during the time horizon are included in the model. The addition of expert knowledge would be possible, if the forecaster thinks the past trends insufficiently cover the possible future outcome. The model suffers from a small input time horizon because the migration data are restricted back to the years 1990. Older data is not representative because of the overall very different geopolitical situation in Eurasia back then. Furthermore, fertility is difficult to forecast, since it is significantly influenced by policy as well. We tried to induce this effect to some extent into the model as well, following a ceteris paribus assumption in family policy to avoid bias as well as possible. Our forecast horizon is 2040 and not 2060 or 2100, as in other studies, since we do not intend to create misinterpretations for the far future, for which forecasts are not possible with the available data.
A larger forecast period would be interesting but cannot be achieved via responsible statistical modeling. Thus, the future availability of input data suited for model estimation will improve the quality of our models and allow for longer forecast horizons. Even with a forecast horizon that reaches only until 2040 the uncertainty is rather large. Most of the risk stems from the uncertainty about future net migration. Although the net migration model performs reasonably well, a possible extension of the model would be separate estimations of in-and out-migration.
Joint estimation of birth rates, survival rates and migration numbers (or rates in the case of outmigration) would represent another possible extension.
Empirical updating might be required if the development in the upcoming years differs from our forecast due to political or economic developments. Those structural breaks are not implemented in our simulation approach.