
The most useful elucidation comes from the 15 indicators since these designate relatively independent yet vital areas at the policy and action level. The indicators are building blocks of ESI, and they indicate for which factors a state’s score is high or low. For example in Uttar Pradesh high population pressure, low natural resource endowment and high water pollution brings the ESI score down. The state’s performance in many other indicators such as natural resource depletion, waste generation, energy management and government’s initiative has been modest. Since natural resource endowment is difficult to alter in a positive direction, the state can improve its sustainability index by focusing on two immediate challenges: high water pollution and high population pressure. The five broad policy components in ESI are population pressure, environment stress, environment systems, environment impact and environment governance. Though not used in calculating the ESI score, they simplify the multidimensional concept of Environmental Sustainability.
i. Selection of the variables in ESI
In calculating ESI, data covering a wide range of environmental factors were sought. The variables were chosen according to their relevance, accuracy and reliability. For some variables, as was the case for air and water pollution, data for each state were not available. Thus datasets had to be customized with data gathered from multiple sources. If the chosen datasets were relevant, they were further scrutinized for accuracy and reliability. In most cases, data were sought from the most recent available government sources and reports. In India, government data are the most reliable and are collected using a standard methodology. For example, all data related to forest cover, total geographical area, etc. was taken from the Forest Survey Report, 2003. In certain cases proxy variables were used to capture important measures. For biodiversity, since data for threatened species of mammals, birds and reptiles as a percentage of total known breeding species were not available at the state level, the proxy variables of total percentage of wetland area and total percentage of protected area were used instead. The percentage of untreated waste water discharged to total waste water generated was taken as a proxy for water pollution by industries, and consumption of fertilizers and pesticides per hectare of agricultural land was taken as a proxy measure of water pollution arising from agriculture.
ii. Standardization of the variables for comparisons across states
To use data for calculating the ESI score for each state, the raw data should be on a comparable scale; therefore suitable denominators were chosen to transform data into a comparable scale. For example, data like forest cover was made comparable by taking total geographic area as the denominator, while data like incidence of respiratory disease was made comparable by taking total population as the denominator. The most commonly used denominators were GDP, total population and total geographical area. This process ensured that no state was given undue advantage or disadvantage because of its geographical size or population. Also the percentage change of a variable was taken into account in some cases to capture the rate of flow of resources or the rate of accumulation of waste. In doing so, a state’s relative performance over the years is gauged; this procedure further mitigates differences arising from area or population size.
iii. Transformation of the variables for the imputation and aggregation procedures
After adjusting the data for comparisons across states, the data were then aggregated. In order to adjust for the different units of the different variables and the need to assign a relative score to each state, the data were transformed into Z-scores, which represent standardized deviations from the mean. These Z-scores have a mean of 0 and a standard deviation equal to 1. The Z-score is calculated from the following formula:
Z = (x - µ) / σ
Where,
X = value of the variable; µ = mean; σ = standard deviation
Z-scores computed from datasets with different units can be directly compared since these numbers do not express the original unit of measurement. As the Z-score represents the number of standard deviations from x to the mean, it gives a relative score for all variables. In the cases where a state’s performance on either extreme of the spectrum might have skewed its overall score, logarithmic transformation was performed to reduce the impact of outliers. All variables that had a skewness value less than 2.5 were transformed using the Z-score transformations and the rest were transformed using the logarithmic transformations. The latter were then again converted into Z-scores such that they can be aggregated. The variables for which logarithmic transformation was used are listed in Table 2.
iv. Substituting values for missing data
There were many instances where no value was available for the variable in the current dataset. As discussed earlier, this is a serious constraint of the study, as no values for a particular state may affect its ESI score. Missing data may also reduce the precision of a calculated ESI score because there is less information than originally planned. The regression imputation method was used to impute the missing values. It is based on the assumptions that the marginal distributions of the data are normal and that linear relationships between variables exist that can be utilized for building linear regression models that predict the missing data.
v. Changing direction of the Z-Score according to the ESI
Since there were both positive and negative variables among the chosen ones, the computed Z-score for all variables lie within the range +3 to -3. For example, in the case of the variable % change in forest cover, a positive Z-score would highlight a change in a favorable direction and a negative Z-score would highlight a change in an unfavorable direction. Here, the interpretation of Z-score is the same as that of the ESI score. So in such cases Z-score can be used directly without recoding it for direction. But in the case of the variable population density, a more positive deviation from the mean (i.e., a higher Z-score), would mean a higher absolute value for a state. Thus a higher positive Z-Score would mean a high population density, which is not in a favorable direction for the ESI of that State. The Z-score should therefore be recoded for its direction. This was done by simply changing the sign of the z-score while keeping the magnitude of the Z-score same. Thus, all higher positive Z- scores would get converted into lower negative Z-scores and vice versa.
vi. Winsorization of the data
Winsorization is an imputation rule limiting the influence of the largest and smallest observations in the available data (OECD, Definition). It does so by shifting the observations in the tails of the distribution to specified percentiles. For each variable, the values exceeding the 95th percentile are lowered to the 95th percentile and the values smaller than the 5th percentile are raised to the 5th percentile. This was done so that a few very large or very small values do not bias the ESI score for a state.
vii. Aggregation of the data to indicator scores and the final ESI score
Out of the multiple methods of aggregation; the equal weighted average has been used to compute the ESI. There were 44 underlying variables, which were aggregated into the 15 indicators that were used to calculate the final ESI score. While taking the aggregate, equal weight was given to the each variable. The score for all the 15 indicators was combined to give the final ESI score. The ESI score was made comparable by rescaling the scores from a low of 0 to a high of100. The states were ranked according to their ESI score. The higher the ESI score the better the state’s performance and the higher its ranking.
viii. Data Sources
The major data sources were the databases of census of India, government surveys, reports and websites, parliament questions, and central pollution control board. Natural resource endowments data were available from the reports published by the Ministry of Environment and Forests, India. The literature and data sources available within the environment information system (ENVIS), India were also consulted for the study. Additionally, the parliamentary session data books proved useful, as they provided testimony to the concerns of policy makers regarding the environment, as well as steps taken to mitigate environmental degradation. Other reports consulted includes: Annual Forests Reports, Annual Plans, National Human Development reports, National Population Census, Agricultural Census, National Family and Health Survey reports, National Sample Survey reports, Statistical Compendium of Environmental Indicators, other Central Statistical Organization (CSO) reports, State Budgets, and other government reports. Two OECD publications—the report of the Working Group on Environmental Information and Outlooks on “Aggregated Environmental Indices-Review of Aggregation Methodologies in Use” and the OECD Statistics Working Paper, “Handbook on Constructing Composite Indicators: Methodology and User Guide”--provided frameworks through which statistical methodologies could be compared.
ix. Limitations
The state of the environment is multidimensional and is difficult to capture in a single index. The ESI is not designed to provide an exhaustive picture of a state’s environmental issues, but rather to help reveal trends and draw attention to phenomena that require further analysis and possible action. ESI should thus be considered only as one tool for evaluation. To gain a holistic picture of environmental issues ESI should be supplemented by scientific and policy-oriented analyses that explore the causal factors driving environmental changes and offer prescriptions on how to mitigate such change. A more robust outcome can be calculated using the same methodology, provided that better data are available.
ESI uses the state as the unit of measurement. Each state has different ecological, geographical, social, economic and institutional structures. Beyond inter-state differences, variation within states, especially large states, can be quite high. While large states like Uttar Pradesh or Maharashtra are heterogeneous with unequal wealth and resource distribution, smaller states like Goa are more homogenous with a small population and geographical area. Although these differences may have some impact on ESI scores, the state was chosen as the unit of measurement as most of the resources in India are measured at the state level, and hence data are available at the state level only. Moreover, as the state is the key unit for policy formulation and implementation, the choice of the state as the unit of measurement is relevant for policy makers.
In choosing the 44 underlying variables, consideration has been given to develop as complete a picture of the state of the environment as possible. As it would have been neither practical nor possible to measure every aspect of the environment, some variables were by necessity omitted.
Another important limitation in developing ESI was the availability of data. For those desired variables for which reliable data sources could not be found, proxy variables were used instead. As with other environmental studies, missing values have been a serious concern in the formulation of ESI. Missing values were estimated through the statistical process of imputation, which provides a correlation with values of other variables but not the actual measured value. Therefore, missing values may have influenced the final ESI score of a state.
ESI 2007 is not a proximity-to-target approach where a state’s performance is measured and compared in absolute terms, but rather a measure of relative performance since environmental benchmarks are difficult to set.
