Regional prices in early twentieth-century Spain: a country-product-dummy approach

This paper explores regional price variation in early twentieth-century Spain. Using consumer price information from the bulletins published by the Instituto de Reformas Sociales between 1910 and 1920, we build a dataset with a total of 40,581 quotes covering 22 items for each of the 49 provinces. We then estimate provincial price levels following a country-product-dummy (CPD) approach. Our findings suggest that spatial price variation existed across Spanish provinces. In line with the Balassa–Samuelson conjecture, consumer prices and productivity levels were somewhat related. Additionally, it is shown that prices rose in all provinces after the outbreak of World War I. Even more, it appears that this demand-shock brought about spatial asymmetries in price growth.


Introduction
Measurement remains the cornerstone of economics. The System of National Accounts (SNA), established after the Second World War, enabled deeper economic analysis and stimulated further research. With the creation of the International Comparison Program (ICP) in 1968, it became possible to make reasonable comparisons 1 3 of per-capita income across countries. 1 However, subnational data are scant, even though there are noteworthy regional disparities in productivity and prices in large developing countries (Brazil, China, India…). In economic history there have been several efforts to reconstitute macroeconomic aggregates (i.e. GDP) at subnational level (Fukao et al. 2015;Rosés and Wolf, forthcoming). Nevertheless few, if any of these, look at spatial price variation.
In the absence of regional prices, nominal GDP has conventionally been adjusted using a national deflator which, in the presence of spatial price variation, could bias interregional comparisons of per-capita income. Using national deflators is common practice even today. Eurostat publishes regional data in terms of the purchasing power standard (PPS), which is constructed at country level. 2 Indeed, the costliness of the data collection process in a way explains the scarcity of regional price levels. 3 Obviously, this limitation is even more acute for historical periods.
In economic history the literature on living standards has received considerable attention in recent years. Following in the footsteps of Angus Maddison and the Great Divergence debate (Pomeranz 2000), several studies have delved into historical wage and price data to explore living standards across major cities and regions (Allen 2001;Broadberry and Gupta 2006;Allen et al. 2011Allen et al. , 2012. This approach has shed further light on the matter, complementing Maddison's real per-capita GDP backward projections. Ideally, direct comparisons would be preferable, as Lindert (2016) points out. However, consumer price data are often unavailable or have limited coverage, especially before 1914. 4 This study hence explores regional prices in a historical context: early twentiethcentury Spain. To do this we collected a large dataset of market prices (over 40,000 observations) from the bulletins of the Instituto de Reformas Sociales between 1910 and1920. Inspired by the methodology used in the World Bank (2013), we then estimate subnational price levels using a time-adjusted country-product-dummy (CPD) model. Thus, the contribution of this work is twofold. First, this is, as far as we know, the first time this methodology has been used in economic history. Second, this approach permits us to quantify differences in price levels across Spanish provinces and also empirically assess its evolution between 1910 and 1920.
Having said that, it is worth remembering that our period of study, 1910-1920, was a fundamental episode in the economic history of Spain. Although Spain remained neutral in World War I (WWI), the economic repercussions were noteworthy. In fact, it has been widely discussed that this demand-shock had an asymmetric 1 The ICP collects millions of prices from around the world to compile purchasing power parities (PPPs). 2 The European Union acknowledges the importance of having purchasing power parities (PPPs) (Regulation (EC) No 1445/2007). In addition, Eurostat requires spatial adjustment factors (SAFs) every 6 years to calculate PPPs using prices collected in various locations of each member state. 3 For recent work in this field see Aten (2017) for the USA, Biggeri et al. (2017) for China, Deaton and Dupriez (2011) for Brazil and India, and Majumder and Ray (2017) for India. 4 Emery and Levitt (2002) compile price indices for thirteen Canadian cities from 1900 to 1950. Chen and Devereux (2003) study price convergence across cities in the USA since 1918. For Spain, Rosés and Sánchez-Alonso (2004) construct provincial real wages from the mid-nineteenth century to 1930.
Regional prices in early twentieth-century Spain: a… impact across Spanish provinces (García-Delgado 1981). This, in turn, is compelling since national market integration was underway. In this regard, Rosés and Sánchez-Alonso (2004) found that real wage convergence was interrupted by WWI.
Moreover, it appears that regional disparities bounced back in the 1910s (Rosés et al. 2010).
Overall, our results show the existence of spatial price variation, which is particularly relevant for interregional comparisons of material living standards. We also empirically assess the economic repercussions of WWI. In line with the existing literature, we find, on the one hand, that prices rose in all provinces after the outbreak of WWI. Nevertheless, our study also shows, on the other hand, that this demandshock brought about asymmetric price growth. This finding casts doubt on existing literature and thereby calls for further research and discussion, especially as regards regional income disparities. The remainder of the paper is structured as follows. In Sect. 2 we briefly describe the historical context. Then, Sect. 3 describes data and methodology. Our findings are then presented in Sect. 4, while Sect. 5 concludes with the study's main implications.

Historical background
During the period that goes from mid-nineteenth century to the Civil War (1936)(1937)(1938)(1939), Spain entered the path of modern economic growth (Prados de la Escosura 2017). Although per-capita Gross Domestic Product (GDP) growth was initially unimpressive, it accelerated in the early decades of the twentieth century. In general, the adoption of novel technologies and the process of national market integration are regarded as the main engines of economic development. Anyhow, Spain lagged behind the European industrial core (Nadal 1975;Pollard 1981). Besides, the profound socioeconomic transformations that accompanied industrialisation were only found in Catalonia and the Basque Country. 5 This somewhat explains the rise in regional income inequality in Spain (Rosés et al. 2010).
Alongside, in the second half of the nineteenth century the integration of the national market began. 6 Before, Spain was essentially fragmented into various local and regional markets largely unconnected. Yet, several reforms undertaken by liberal governments and the development of a transport network facilitated this process. In particular, liberal governments aimed at strengthening property rights and promoting interregional trade, which was constrained by several barriers. Also, the railway and further advances in other modes of transportation overcame geographic obstacles. 7 Several studies have looked at the national market integration. Peña and 5 Whereas in Catalonia, the cotton industry, with its long tradition, stretching back to the eighteenth century, steadily mechanised, in the Basque Country the iron and steel industries set in motion the industrialisation process (Nadal and Carreras 1990). 6 See Rosés et al. (2010, pp. 245-246) for an account of the integration of the Spanish market between 1860 and 1930. 7 According to the calculations made by Herranz (2005), in 1878 there was a reduction in up to 86% in haulage costs thanks to the introduction of the railway. Sánchez-Albornoz (1983), for instance, found a steady regional convergence in grain prices since mid-nineteenth century. Castañeda and Tafunell (1993) documented a fall in the interregional variation in the interest rates of short-term bills of exchange while Rosés and Sánchez-Alonso (2004) presented some evidence in support of real wage convergence between 1860 and 1914.
In spite of the growing integration of the goods and factor markets in Spain, regional income inequality, measured in terms of per-capita GDP, followed an upward surge during the second half of the nineteenth century, reaching its peak at the turn of the twentieth century (Rosés et al. 2010). Then, regional income convergence prevailed until the outbreak of WWI, when it came to a halt (Rosés et al. 2010). Likewise, convergence in real wages across Spanish provinces also appears to have been interrupted (Rosés and Sánchez-Alonso 2004). Interestingly, market prices more than doubled in the 1910s (Maluquer de Motes 2005). As Sudrià (1990) found, Spanish neutrality brought about a sharp and unexpected export boom which had a substantial impact on the balance of payments. Still, this demand-shock only affected specific manufacturing industries. In fact, traditional exports such as citric or mineral resources fell sharply.
In consequence, if WWI opened up new avenues for investment for specific industries, and hence provinces, where productivity was relatively high, then real wage convergence might be disrupted. However, the latter process could also be affected trough asymmetric inflation. That is to say, if inflation was more acute in some provinces then this demand-shock would exert upward pressure on salaries in order to maintain purchasing power, creating a mirage of regional income inequality growth. This study thus focuses on market prices and aims at examining the extent of spatial price variation in a developing economy, such as early twentieth-century Spain, and extending our understanding of this historical episode.

Data and methodology
In economics there has been a long tradition of international comparisons of income, especially since the creation of the International Comparison Program (ICP) in 1968. 8 The ICP standardised a methodology and coordinated national statistics offices in order to produce spatial price deflators or purchasing power parities (PPPs). 9 Although the procedure to compute PPPs is rather technical, its fundamentals are not. National statistics offices, under the guidance of the ICP, design a representative basket of goods and services grouped under basic headings (BHs), e.g. bread, rice and so on. 10 BHs are the lowest level of aggregation for which expenditure data are available. Thus, the first step is to compute PPPs at BH level using price information at item level. Data availability usually determines whether item-specific prices or national averages are used, or whether weighting urban and rural prices is possible. The BH-PPPs and information on household expenditures are then used as inputs to compute aggregate measures of relative prices and volumes. 11 That said, it could be argued that the System of National Accounts (SNA), created in 1953, and household budget surveys developed simultaneously. In Spain the first household budget surveys were carried out in 1940, 1958 and 1964-1965. 12 Without information on household expenditures, the ICP methodology cannot be properly executed. 13 To overcome this, historical sources have been used to create a representative basket of goods and services with their expenditure weights (for Spain, Ballesteros-Doncel 1997a, b; Maluquer de Motes 2013). However, this approach is normally used for estimating a cost-of-living index (CLI) at national level and evaluating its evolution over time. But if the focus is on the spatial variation of prices and real income, a different methodology needs to be developed. In this study, we estimate subnational PPPs in early twentieth-century Spain using a large dataset of market prices and a country-product-dummy (CPD) model.

Data
Our price data come from bulletins published by the Instituto de Reformas Sociales (IRS) from 1910 to 1920. 14 Founded in 1903, the IRS was a governmental body whose purpose was to examine the condition of the working class and the relationship between labour and capital. In the late nineteenth century, the poor conditions of agricultural and industrial workers brought about social unrest, conflict and the so-called «social question» debate. In 1883 the Spanish government formed a Commission for Social Reforms, but to little avail. Two decades later the Commission gave way to the IRS, 15 which, although similar in purpose, had more muscle and resources to counter the mounting social problems. 16 11 When information on BH-PPPs and expenditure is available, the Gini-Éltetö-Köves-Szulc (GEKS) aggregation method is used. Before the 2005 ICP, the Geary-Khamis (GK) aggregation method was used. 12 Although the first Spanish household budget survey dates back to 1940, there was never any technical official publication of it (Celestino-Rey 2002). 13 As Deaton and Heston (2010) point out, the PPP estimations rely on there being suitable data and an appropriate multilateral price index that satisfies certain properties, such as reciprocity and transitivity. It is worth noting that: "As has been known, at least since Fisher, price indexes cannot satisfy all the properties that our price-based intuition suggests from them; price indexes are not prices" (Deaton and Heston 2010, p. 9). 14 Figure 6 in the "Appendix" shows the front page of the IRS bulletins. 15 Established during the government of Francisco Silvela, the Instituto de Reformas Sociales (IRS) came under the aegis of the Ministerio de Gobernación. Gumersindo de Azcárate, a distinguished member of the reformist Institución Libre de Enseñanza, was its first president. 16 The Instituto de Reformas Sociales (IRS) actively contributed to the development and enforcement of labour standards such as limiting the length of the working day to 8 h, carrying out work inspections, reviewing foreign labour regulations, mediating between workers and companies, and developing an active policy to promote social housing (Palacio-Morena 1988; Sánchez-Marín 2014).

3
The IRS decided to carry out an ambitious plan to measure the cost of living. A price questionnaire was prepared for the purpose and sent to provincial boards. To begin with, the boards filled in the questionnaires and returned them to the IRS headquarters in Madrid. 17 By 1909, however, several methodological changes had been introduced to increase consistency and coverage. 18 First, the questionnaires were to be sent to municipalities instead of provincial boards. Second, prices were collected twice a year, in winter (October-March) and summer (April-September). 19 And third, the items included in the questionnaires were to be representative of workers' consumption. Originally a selection of 40 items was made, but this list had shrunk to 22 by 1915 (See Table 5 in "Appendix").
In order to have consistent and comparable information, we consider only items for which prices were published over the whole period, 1910-1920. The list of items is summarised in Table 1. We also make some adjustments. During 1910-1915 the questionnaire included different types of bread (wheat, barley, maize, rye) and flour (wheat, maize, rye), while from 1915 onwards it is reported one price of bread and flour. Although we acknowledge quality differences and a marked regional variation in their consumption, we take the cheapest value reported as the price of bread and flour for the early years. 20 The remaining items appeared in both periods. This gives us representative and comparable price information. Figure 1 shows the number of questionnaires sent (returned and no returned) from 1910 to 1914 and the response rate by province during this period. 21 Approximately 12,000 questionnaires were sent every year and 6500 returned. The response rate by province ranged from 25-35% to 65-75%. Despite this response rate, roughly 50%, the IRS managed to collect enough information to publish consistent and reliable summary statistics in its bulletins. Market price data were presented for each province twice a year (winter and summer). The IRS also distinguished between provincial capitals and other municipalities, 22 and as a result the bulletins show values in the capital along with the highest, lowest and most frequent prices reported in the rest of the province. 23 In short, the bulletins provide the market prices of 22 items in 49 provinces (capital, province) twice a year (winter, summer) from 1910 to 1920. This would amount to a maximum total of 47,432 values. 24 However, the bulletins are occasionally incomplete. 17 There were 49 provinces in Spain at that time. The Canary Islands were a single province, but in 1927 they were split into two, thereby making up the 50 provinces of today. 18 Instituto de Reformas Sociales (1916, pp. 5-6). 19 Instituto de Reformas Sociales (1916, p. 6). 20 From 1915 onwards, the bulletins only provided the price of bread and flour. Having also used the average price and the price of wheaten bread/flour the main results are robust and consistent. 21 Information for 1915-1920 is not available. 22 In 1910, except for Oviedo, Ciudad Real, Jaén, Pontevedra and Tarragona, the provincial capitals were the largest centres of population in each province. 23 From 1910 to 1914, the bulletins also indicate the municipalities where these prices were reported. See Instituto de Reformas Sociales (1916, p. 7) and Fig. 7 in the "Appendix". From 1915 onwards, the bulletins reported most frequent, minimum and maximum prices without specifying the municipalities, see Fig. 8 in "Appendix". 24 The dataset contains 49 provinces × 2 (capital, province) × 22 items × 11 years × 2 (winter, summer) = 47,432.
Regional prices in early twentieth-century Spain: a… Unreported prices for Housing (1 room) also include some unusually low values (less than 21 pesetas), which we have excluded. A total of 35 reported values were removed from the sample. This threshold, though arbitrary, is well below the average housing price in our sample, so arguably those observations would be either typos or transcription errors. 26 For a brief description of the housing market in early twentieth-century Spain, see Carmona et al. (2014). "… Spanish law did not allow ownership of land to be held separately from the ownership of rights over that land, and in consequence, all floors of any building and its land were required to have only one owner. Indeed, this created a pecuniary entry barrier to the housing property for urban workers since, typically, houses in cities had several floors and, hence, their price was quite high. As a result, a large rental market was generated. This legal framework that linked land and housing property was in force until the end of the period under study". This state of affairs changed with the Royal Order of 26 October 1939 (Carmona et al. 2014, p. 123).
Unreported values represent 14.4% of the potential sample (6851), leaving us with 40,581 prices. Although the distribution of missing values is relatively even, there are some peculiarities worth noting. There are a good many prices reported every year, as Fig. 9 in Appendix shows, but this is less evident when looking at specific items, where there appears to be a major issue with Housing (1 room), for example, for which roughly half the values are missing. 25 Indeed, this is major concern since we expect accommodation or lodging costs to be strongly correlated with income. 26 Fig. 1 Total number of questionnaires sent (graph) and response rate (map), 1910-1914. Source: Instituto de Reformas Sociales (1916 In order to shed further light on the matter, we assess the distribution of these missing prices by year and province. In general, the annual rents for a single room are more frequently reported between 1915 and 1920, but there are some noticeable spatial disparities. Figure 2 illustrates the percentage of reported values with respect to the potential number of price quotations for Housing (1 room) and for all 22 items by province for the whole period. A fundamental issue arises when looking at Madrid and Barcelona, where data representativeness is just 34.1 and 36.4%, respectively. This is a serious concern that must be considered and dealt with in the following section. The main urban agglomerations in the early twentieth century were Madrid and Barcelona, and thus the limited data on housing in these provinces may be affecting our analysis. If, for instance, income and population grew more rapidly in Madrid and Barcelona and the supply is sluggish, greater spatial disparities in prices may arise. A low representativeness of Housing (1 room) will thus be underestimating the effect. In this line of thinking, Carmona et al. (2017) found that rents, despite the growing demand for housing witnessed in the decades before the Civil War (1936)(1937)(1938)(1939), were affordable. Anyhow, this should be remembered when discussing the main results.
Notwithstanding these issues, the IRS bulletins provide enough information to build up a dataset of 40,581 market prices on 22 items representative of the consumption of Spanish workers in the early twentieth century, covering 49 provinces. Thus, our dataset fulfils the basic requirements of modern surveys since the prices are representative and comparable across space and over time, thereby permitting the living conditions in both dimensions to be studied. In fact, we are unaware of the existence of any comparable datasets for other countries in a historical perspective.

Methodology
The country-product-dummy (CPD) method was developed to deal with missing data in the construction of price indices (Summers 1973). 27 This approach states that p ij , namely the price of item i in region j , is the product of price effects, commodity effects and a random disturbance term  (1910)(1911)(1912)(1913)(1914)(1915)(1916)(1917)(1918)(1919)(1920) where PPP j is the purchasing power parity of region j with respect to other regions, P i is the price level of item i relative to other items, and v ij captures the random disturbance terms. The above expression can be rewritten as follows: Using ordinary least squares (OLS), the above equation can be easily estimated, where D j and D i are region and item dummy variables, while ε ij captures random error terms, which are independently and identically distributed with zero mean and variance σ 2 . More specifically, D j is equal to one if the price was collected in region j and zero otherwise. Equally, D i is equal to one if the price refers to item i , and zero otherwise. In order to escape the dummy variable trap or simply avoid multicollinearity, one region and one item are omitted and act as a reference group (Wooldridge 2016). Thus, the estimated coefficients have to be interpreted taking into account these reference groups. Having said that, the spatial price deflators or PPP j will be: The attractiveness of the CPD method lies in its simplicity and transparency (Hill and Hill 2009), especially when dealing with non-comparable items, quality characteristics (Biggeri et al. 2017a, b) and missing data. 28 Yet the most distinguishable feature of the CPD approach is its stochastic nature, i.e. it is possible to implement specific econometric tools (Rao 2004;Biggeri et al. 2017a, b). It also provides standard errors, which could be used to detect outliers and errors in the dataset (Hill 2004;Hill and Hill 2009).
Although the International Comparison Program (ICP) only uses the CPD method to compute BH-PPPs, Rao (2005) recently proposed a generalisation of the standard CPD to estimate general price indices. In line with the ICP approach, he suggests a weighted country-product-dummy (WCPD) method in which item prices are weighted according to their relative importance, where w ij captures the relative importance or weight, expressed as the expenditure share of item i in region j . Unfortunately, we only have expenditure patterns at (2) ln p ij = ln PPP j + ln P i + ln v ij = π j + λ i + ε ij national level (Ballesteros-Doncel 1997a, b; Maluquer de Motes 2013). In the following section, we use both CPD and WCPD methods with country weights to estimate subnational PPPs.

Empirical analysis
To fully exploit our dataset, we use a CPD method with all the price data. It is worth remembering that we have prices for 22 items collected twice a year (winter and summer) over 11 years in provincial capitals and other municipalities. Since our research focus lies in the spatial dimension, we adjust Eq.
(3) to control for timevarying effects as follows, where α is the constant term; D j , D i and D t represent province, item and semester dummy variables, respectively; restprov is a dummy variable that takes the value zero if the price is collected in the provincial capital and one if collected in another municipality; θ jt is the interaction term province-semester 29 ; and ε ij is the random error term. 30 The characteristics of the dataset allow us to extend the reference model (3) and to examine some interesting features such as urban-rural differences (Hill and Syed 2015), and potential asymmetries across provinces over time.
Our empirical strategy relies on both unweighted and weighted regressions. In the unweighted estimation, all prices in our sample enter the regressions with a similar weight. However, the weighted estimation requires the use of a basic consumption basket. We therefore have to assign a weight to each of the 22 goods in our sample, based on their respective expenditure share in the budget of an average family at that time. While consumption patterns may vary across provinces and between urban and rural areas, we use a single basket that is deemed to be representative of Spain in the 1910s. 31 The weights assigned are mainly based on the work by Ballesteros-Doncel 29 With 49 provinces and 22 semesters the CPD model potentially allows for (48 × 21) 1008 interactions. However, there are 14 missing values which gives a total of 994 variables. 30 We relax the homoscedasticity assumption and consider robust standard errors. 31 Ideally, one would like to have regional baskets to measure potentially different consumption patterns (Lindert 2016). However, obtaining regionally adapted baskets for the early decades of the twentieth century in Spain is indeed a difficult task. Baskets for specific regions such as Navarra and Vizcaya can be found in Lana Berasain (2007) and Pérez-Castroviejo (2006), respectively. Our results, available upon request, are robust to the inclusion of regional variation in consumption baskets. (1997a, pp. 373-374). 32 The weights given to each of the 22 items in our sample can be consulted in Table 2.
A fundamental challenge of this approach is the estimation of Eq. (6) and the interpretation of results. This is far from trivial because the reference group depends on several dimensions. Therefore, we first estimate Eq. (6) and then we calculate the margins or average values for the variables of interest D j , D i , D t , θ jt . Table 3 shows some initial results. Column CPD refers to the unweighted model, while WCPD refers to the weighted model. 33 In both cases, there appears to be high R-squares (0.9391, 0.9612), indicating how well the models fit the data. . The item 'clothing' is usually included in these consumption baskets but unfortunately is absent from our price data. 33 While the CPD model is estimated using the ordinary least-square (OLS) technique, the WCPD is estimated using the weighted OLS technique. For the margins, we use the postestimation command "margins" in the statistical software Stata (StataCorp 2015).
Regional prices in early twentieth-century Spain: a… Table 3 Predictive margins of the province dummy variable. In order to assess the existence of spatial price differences, we then turn the attention to the predictive margins of D j . 34 According to Table 3, all the margins are statistically different from zero at one per cent. As expected, Barcelona was, on average, the province with the highest value followed by Cádiz and Oviedo. The provinces with the lowest values were Cáceres, Salamanca and León. Although there are some minor changes in the results obtained from the WCPD model, the relative positions are stable, which provides consistency to our findings. Furthermore, there seems to be marked price differences between provincial capitals and other municipalities, which goes in line with the literature (Rojo and Houpt 2011;Ramon and Ramon 2017;García Gómez and Escudero 2017). Using both approaches, prices were higher in the provincial capitals. 35 The predictive margins referring to items D i and semesters D t are reported in Table 6 in the "Appendix". 35 In the CPD model the estimated coefficient is equal to ̂C PD = −0.0727, indicating that in provincial capitals prices were on average 7% (= [exp(− 0.0727) − 1]100) higher than in other municipalities (Halvorsen and Palmquist 1980;Wooldridge 2016). With the WCPD model the urban-rural penalty was still higher and prices in provincial capitals were 10.9% higher than in other municipalities ( ̂W CPD = −0.1157).
To make it easier to interpret the above results and better quantify the spatial heterogeneity, Table 4 presents the sub-national, normalised PPPs (Spain = 100). 36 In the table we can clearly distinguish between those provinces with a price index higher than the national average (PPP > 100) and those with a lower one (PPP < 100). Two compelling results can be extracted from this analysis: first, the existence of price differences among Spanish provinces at the beginning of the twentieth century, and second, the difference between Barcelona and all the other provinces. Figure 3 provides new evidence on the geographical patterns of regional prices in early twentieth-century Spain. In short, price levels were relatively higher in the northeast (Catalonia), the southwest (Western Andalusia), some northern provinces (Asturias, Guipuzcoa, Vizcaya) and Madrid. In the northwest (Galicia) and the interior (Castile and Leon, Extremadura, Castile-La Mancha) price levels were somewhat below the national average. This pattern is consistent using either approach, thereby pointing to a marked regional disparity in prices.
Interestingly, Madrid was not among the top-5 provinces. In fact, the relative position of Madrid was rather stable independently of the model (CPD; WCPD). 37 Although the literature offers little comparative evidence, our results go in line with official information published later, which pointed that prices in the city of Madrid were not as high as in the cities of Barcelona, San Sebastián, Bilbao or Sevilla. 38 Similarly, Gallego (2016) shows that food prices between 1909 and 1913 in Madrid were, in general, lower than in Barcelona. In this regard, some studies have recently stressed the relevance of environmental conditions, transport, and distribution channels in explaining regional differences in food prices (Nicolau-Nos and Pujol-Andreu 2006;Gallego 2016).
As previously mentioned, one of the main concerns regarding our full dataset is the number of missing values for housing. To be sure that this is not affecting our results for spatial heterogeneity in prices, we repeat the same procedure considering only food, which contains 15 items and represents a 75% share of expenditure (see Table 2). The sub-national PPPs are reported in Table 7 in "Appendix". As we are considering fewer items, the number of observations decreases from 40,581 to 30,188. The R-squares of the main regressions, although slightly lower than those obtained with the full sample, remain high (0.9176 in the CPD-Only food and 0.9200 in the WCPD-Only food). Regarding the spatial differences in prices, these findings reinforce our previous results. Barcelona is, once again, the province with the highest price level, then followed by Tarragona and Lérida, while Salamanca, Cáceres and Zamora exhibited the lowest prices.
Equally, our approach also permits the evaluation of prices between 1910 and 1920. Figure 4 shows the time-effect D t of four different specifications (CPD, WCPD, CPD-Only food, WCPD-Only food). In line with the existing evidence (de Ojeda Eiseley 1988;Ballesteros-Doncel 1997a;Maluquer de Motes 2013), we find that prices in Spain rose after the outbreak of WWI. 39 Statistically, the results are, on the whole, significant. That said, it is worth stressing that, in spite of all the difficulties associated with the historical price information, we have assembled a nationwide dataset with 40,581 market prices. Ballesteros-Doncel (1997a) also made use of the bulletins published by the Instituto de Reformas Sociales, but only included 37 The results are stable when all items or only food are considered. Table 7 in the "Appendix" illustrates the sub-national PPPs when only food items are included in the CPD and WCPD models. 38 Using information processed by the Instituto Nacional de Estadística (INE) in 1949, a cost-of-living index (CLI) for the Spanish provincial capitals was published a year later (Comisaría General para la Ordenación Urbana de Madrid 1950). The information was presented as the real value of 1 peseta in Madrid. Although different in nature, the correlation between these CLI and our provincial price levels range between 0.52 and 0.60. 39 As Maluquer de Motes (2013) points, the sudden rise in the level of prices stimulated research and discussion in early twentieth-century Spain (Bernis 1923;Instituto de Reformas Sociales 1923  Finally, the interaction term (province-semester: θ jt ) allows us to evaluate the behaviour of prices across provinces and over time. Figure 5 then illustrates the ̂j t as box plots, where each box contains the 25-75th percentile of the distribution in each semester t . Although spatial asymmetries appear not to be important before WWI, this is less obvious thereafter. The growing dispersion of the estimated interaction-terms indicates that prices behaved asymmetrically across Spanish provinces. In this context, the increasing inequality in wages or income from 1910to 1920(Rosés et al. 2010 was not just the result of productivity differences but also of asymmetries in price growth. On the one hand, this demand-shock could have stimulated investment in certain industries, and hence provinces, where productivity was already high. Equally, the asymmetric behaviour of prices could have exerted upward pressure on wages in order to preserve purchasing power. , 1910-1920 (09/1914 = 100). Note The margins of D t for the CPD and WCPD models can be found in Table 6 in "Appendix" 1 3

Fig. 4 Evolution of prices in Spain
Regional prices in early twentieth-century Spain: a…

Conclusions
This paper has explored regional prices in early twentieth-century Spain. Using information from bulletins published by the Instituto de Reformas Sociales between 1910 and 1920, we first created a database of 40,581 prices quoted for 22 items for each of the 49 provinces. In order to fully exploit this dataset, we then estimated provincial price levels for the whole period with a time-adjusted country-productdummy (CPD) model. We also include expenditure weights in the methodology, i.e. weighted country-product-dummy (WCPD), to assess the robustness of the results.
Overall, we find regional disparities in market prices in early twentieth-century Spain. In line with the Balassa-Samuelson conjecture, it appears that productivity and prices were somehow related. For example, prices in the leading industrial provinces of Barcelona and Vizcaya were above the national average. Moreover, price levels in provincial capitals were on average higher than in other municipalities. Interestingly, there was also variation across less industrialised provinces and the price level in the capital-province of Madrid was not as high as could have been expected, which calls for a careful study of the environmental conditions, transportation and distribution networks in each province.
Besides, and in line with the existing literature, this study has also shown that prices rose sharply after the outbreak of World War I. Nevertheless, it has been  , 1910-1920 (09/1914 = 100). Note Using the CPD model, each box plot thus contains the estimated values (̂j t ) from the 25-75th percentile of the distribution. The dividing line is the median. Outliers are thus outside the box and the adjacent lines shown that this price growth was asymmetric across Spanish provinces and hence spatial differences widened. This finding casts doubt on existing literature. First, using only the market prices of the city of Barcelona as representative of Spain between 1910 and 1920 can be misleading (Maluquer de Motes 2013). Second, our study also implies that increasing regional wage or income disparities (Rosés et al. 2010) from 1910 to 1920 were not just the result of productivity differences but also of asymmetries in price growth.
Having said that, it is worth stressing that in a developing economy, such as early twentieth-century Spain, uneven development might give rise to sizeable differences in prices and wants, which must be borne in mind when interpreting past events or designing economic policies.  Fig. 9 Representativeness (%) of reported prices by a year and b item 1 3