Open Access Scientific Reports

Your Research - Your Rights

Empirical Algorithms for Estimating Total Phosphorus 1 During Restoration of Urbanized Polluted Rivers Using 2 Bacterial Technology

Research Article Open Access
State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering, College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China
*Corresponding authors: Amos T Kabo-bah
State Key Laboratory of Hydrology-Water
Resources and Hydraulic Engineering
College of Hydrology and Water Resources
Hohai University, Nanjing 210098, China
E-mail: kabo-bah@greenwaterhut.org
           kabobah@hhu.edu.cn 7
 
Received March 08, 2012; Published September 10, 2012
 
Citation: Kabo-bah AT, Yuebo X, Yajing S (2012) Empirical Algorithms for Estimating Total Phosphorus 1 During Restoration of Urbanized Polluted Rivers Using 2 Bacterial Technology. 1:302. doi:10.4172/scientificreports.302
 
Copyright: © 2012 Kabo-bah AT, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
 
Abstract
 
The expensive nature of water quality field campaigns calls for mathematical models to support forecasting of indicators. This is particularly important for bacterial technology for urban rivers where data is usually rare but rapid reporting of preliminary results during such project implementation is essential. This research developed mathematical models to estimate total phosphorus given a set of water quality indicators (NH3-N, pH & COD) are available. The models were tested with adjusted R2, corrected Akaike Information Criterion (AICc), Bayesian Information Criterion (BIC), PRESS and RMSE. All models indicated a prediction error (RMSE < 20%) except TPM 3 (RMSE = 20.2%). This prediction error confirms with measurements error of 15-20% for most water quality variables. The TPM 1 mathematical model is recommended as the best option for forecasting purposes in water quality modelling. The research can contribute significantly towards numerical water quality modelling in China and maybe in other countries having similar water quality problems as discussed in this paper
 
Keywords
 
Total phosphorus; Empirical models; Water quality
 
Introduction
 
The water quality management is a major issue in China as the economy grows fast to take up the lead in the world. Urbanized streams and rivers are exposed to high increases in nutrient loads. This influences their aesthetics. High-polluted rivers possibly reduce their capacity to withstand incoming storms and floods or excessive precipitation. The study of nutrient dynamics has been conducted in several parts of the world [1]. The restoration of urban rivers can possibly contribute towards flood prevention and effective control. Large cities are growing bigger in terms of population and industrial activity yearly. Due to the availability of jobs, advanced technology and other social opportunities, most people prefer to live in the cities. As noted in a recent research, urban growth is changing and populations have increased from 15 to 50% globally [2]. In urban ecological sustainability, nutrient recycling has been recommended as the best alternative for wastewater management. Urban rivers are subjected to sewage disposal and other waste as a result of poor practices from the community. Urban water and wastewater planners in China are struggling to meet the growing water and sanitation demands while attaining sustainable urban water system [3]. River restoration campaigns have been identified to support Agenda 21 and recent COPs 36 discussion on Climate Change [4,5].
 
Several techniques have been employed over the years towards stream and river restoration. In general, the techniques involved re-aeration using weirs shift in effluent discharge location, use of oxygenator to pump air into water body and use of engineered constructed wetland. Regarding, the various research into the cost and feasibilities of all these methods, wetlands were found to be the most efficient in treating streams or rivers. Wetlands have been found to remove significant amounts of nutrients (i.e. over 70%), and are far better for long-term maintenance and management. The bacterial technology has been used for many years in many countries for industrial and domestic wastewater treatment. However, the use of such a method for the restoration of urbanised rivers in China is relatively new. For instance, the method has been successfully used in the treatment of lakes, wastewater treatment plants and urban streams in Shenzhen, Rui’an and Wuxi of China. The method has received great concern because of the innovative and fast way it lowers the concentrations of the BOD and COD concentrations [6]. These bacterial technologies have been seen as necessary protocols towards sustainable management of urban water systems. However, there is a lack of models for forecasting and monitoring of the nutrient concentrations during such treatment campaigns. Predictive modelling of nutrient storage in rivers is an important aid to monitoring exercises of urban rivers. For instance, mathematical expressions derived by Vollenweider were used to estimate the fate of phosphorus in water bodies [7]. The challenge was that measurements such as lake’s areal surface, areal water load, lake water’s residence time and inflow phosphorus concentration were used to estimate total phosphorus. In a typical river restoration program with bacterial technology, these measurements are irrelevant. In another study, statistical models were successfully used to investigate the Escherichia coli concentrations in beaches in Lake Michigan [8]. Also, the fate of faecal indicators were modelled in the UK using digital land use maps [9]. Further, another study on lakes in the U.S used regression models to monitor nutrient and bacteria concentrations in real time [10]. These models were typically applied to lake conditions, and parameters used were far different from those usually concerned with in bacterial technology programs for rivers. Therefore, mathematical models to predict total phosphorus dynamics to support monitoring, planning and management of urban rivers during bacterial technology programs remain a major setback. The estimation of total phosphorus presence in urban rivers is an important indicator for understanding eutrophication problems [11]. So, this research developed mathematical algorithms to model the fate of the total phosphorus in urbanised biologically treated rivers. The developed algorithms were tested with independent data sets from other restoration experiments. The research envisages enormous contribution towards recent works in numerical water quality modelling, management of wastewater treatment plants and possible support towards approximation of nutrients in modelling packages such ASM1, ASM2, ASM2D or ASM3 [12].
 
Methods
 
Study area
 
The research data was obtained from a biological treatment project in the Rui’an city in the Zhejiang Province of China. This project was undertaken in the months of November and December 2010. The results of this data included daily measurements of Total Phosphorus (TP), Ammonia (NH3-N), Chemical Oxygen Demand (COD) and pH. The project was implemented in two riverine systems: the Fenghu-Songyong River. Six sampling points were observed and all measurements used for this research (Figure 1). The source of Fenghu River is Liangmian river and the outlet is Wenruitang river. The Songyong River flows directly into the Fenghu River. The study area is characterized by relatively warm weather conditions, short duration of winter and longer summer. The precipitation period for this area is divided into three parts: May to April, May to June and August to September. This is partly because the study is located in the typhoon zone of China. The Songyong River is about 280m long. The breadth of the Songyong River averages 5-18m and the water depth is about 1-3m. The Fenghu River is about 740m to the outlet. This river has a breadth averaging between 6-15m and the water depth of about 1m.
 
Figure 1: Study area indicating sampling points in Rui’an region, china
 
The independent datasets were obtained from the Xuxi River. The Xuxi River is located in the Chang Nan District of the Wuxi City, Jiangsu province. The total length of Xuxi River is 1.36 km. It has an average surface width of 4.50 m and water depth of 1.40 m. The poor environmental sanitation around this river contributes towards the high sewage loads in the river. It is estimated that about 10,000m3 of sewage is discharged daily into the river partly because of the nonexistence of the sewage treatment facility [6]. The river is generally characterised by dark brown sediment and floating algae causing the river to appear dark green.
 
Bacterial technology
 
The purpose of this paper is not to reinvent the wheel by narrating all the processes involved in this particular method. The emphasis is more on the ability to used monitored data for the derivation of the algorithms. In this particular method, the selected bacteria and microbial accelerator were directly injected into the river to activate the native bacteria. In all, thirty-four buckets each containing 150kg of bacteria were used for restoring the quality of the Fenghu-Songyong River. For convenience of the method, bacteria were cultured nearby the river and injected into the six sections (Figure 1).
 
Data
 
The derivation and testing of the models were done using two independent datasets – the bacterial technology experiments in Fenghu and Songyong Rivers (FSR) in the Rui’an city, Zhejiang province and Xuxi River in Jiangsu province. The data from FSRs was obtained in December 2010 (Table 3A in appendix). This FSRs data was used to derive the models. The validation of the models was done using the independent dataset from Xuxi River sampled in October 2009 (Table 3B in appendix). The datasets recorded daily measurements of Total Phosphorus (TP), Ammonia (NH3-N), Chemical Oxygen Demand (COD) and pH. Due to the scarce nature of data from such projects, it was assumed that derived model’s ability to predict effectively the variables from an independent dataset implied that, the derived model was functional for practical purposes. This is because the climatic and weather conditions of the two datasets were different and if the derived model predictive power was good, this implied its appropriateness for future practical use. The Ministry of Environmental Protection 2009 report on surface water quality reported that about 42.7% of the rivers in China varies from Grade IV to V, which represents a fairly poor water quality status. This implies that about 42.7% of most rivers in China are not suitable for aquatic life and domestic uses. Therefore, we selected the Fenghu- Songyong Rivers as a representative case for majority of the rivers which are in the inferior category of Class V in China (Table 3A in appendix).
 
The Virtual Beach (VB) MLR model
 
 
The VB Multiple Linear Regression (MLR) Tool by USGS was used to support the work of this study. The study employed the statistical capability of the package to handle multiple linear regressions. This was used to develop the model for predicting the fate of TP against a given set of water quality variables. The MLR model [13] is given by
 
                                                                                           (1)
 
Where P is the predicted total phosphorus α0 is the intercept
 
αj is the slope for the ith explanatory variable
 
ε is the remaining unexplained noise in the data – error
 
The MLR analysis is dependent on least squares method to fit models. It is therefore subject to many considerations i.e. variable interactions, multicollinearity and model selection [14]. The VB uses backward elimination method to help the user select the best appropriate model with the specified explanatory variables. The VB model facilitates model development and offers better chances for developing good models with limited datasets. The VB has been successfully used to develop models for the fate of biological contaminants in beaches [14-16]. The VB has a function to perform data transformations. By default, MLR equations are linear and this has the tendency to limit value of explanatory variables. VB offers a number of transformation methods such as square root and square.
 
Model evaluation
 
VB offers several options for checking the fits of the selected models and the forecasts. In general, goodness of fit and predicative capacity are important to describe a model’s ability to predict. In this particular paper, emphasis was laid on the use of adjusted R2 (R2a), Prediction Sum of Squares (PRESS), Corrected Akaike Information Criterion (AICc), Bayesian Information Criterion (BIC) and Root Mean Square Error (RMSE). The (R2a), AICc and BIC were applied to choose the most suitable model for description of the fate of total phosphorus. The PRESS and RMSE were used to determine the predictive power and accuracy, respectively, of each model.
 
Generally, R2 tend to increase as we add more variables and hence picking the biggest R2 will not necessary select the best model. This mostly results in over-fitting. Therefore R2 is simply perfect for situations in which models have the same number of variables. For the purposes of this research, the adjusted R2 was used. This was to penalize overly complicated models as in the equation below:
 
                                                                                      (4)
 
Where p is the number of models.
 
The adjusted R2 has been exhaustively used to illustrate model performance in several literatures [17,14,13]. The Akaike Information Criterion (AIC) [18] evaluate the goodness of fit of a selected model. In the most general form, the AIC is given by
 
AIC= 2i-2In(j) (5) Where i is the number of the variables in the model, and
 
j is the maximized value of the likelihood function for the estimated model.
 
For a given selected models for a particular data, the preferred model is the one with the minimum AIC value. AIC shows significantly the degree of goodness of fit of the model and also ensures the degree of penalising when increasing the function of the number of estimated variables. AICc is AIC with correction for finite sample sizes. In fact, AICc is AIC with more penalties for extra variables in the model. [19] in their research found that it was strongly recommended using AICc (Equation 6). The AICc was a refinement by [20] to cater for the bias in regression for smaller sample sizes. In this research, the AICc was therefore adopted. It was also realised that the use of AIC presented better results compared to BIC. Despite this, the BIC were included in the research to complement the works of the AICc estimates. The BIC penalises complex models most and gives preference to simpler models in selection. The generic form of BIC is shown in equation 7. BIC is commonly known as the Schwarz criterion [21].
 
                                                                                    (6)
 
Where i is the number of parameters
 
                                                                          (7)
 
Where N is the number of data points in the observed data
 
K is the number of parameters to be estimated
 
L is the maximised value of the likelihood function for the estimated model
 
The PRESS is the sum of the squared external residuals [22]. See PRESS in equation (2, 3).
 
                                                                                           (2)
 
Where,
 
                                                                                                     (3)
 
The external residual for the ith observation is equal to the calculated external predicted value Ÿ(i) without the use of the ith observation. The external predicted values and external residuals are independent of Ÿ(i)and Ÿ(i) is not used in fitting the regression model [22]. The best regression model will have the smallest predictive sum of squares.
 
The VB tool uses Genetic Algorithm (GA) to effectively and efficiently search for the best possible MLR model [23]. GA uses a class of stochastic search procedures called evolutionary algorithms. These algorithms use computational models of natural processes to develop computer-based problem solving systems [24]. The process mimics the natural biological phenomena where organisms produce successive generations. The application of GAs in hydrological and water quality modelling is receiving attention in recent times [25-28].
 
Results and Discussion
 
Model development
 
The models were derived and their evaluation statistics computed using the VB tool. The TP was modelled as the dependent variable and all other variables (NH3-N, COD and pH) were modelled as independent. These produced various possible models that could be used to practically estimate the values of TP. For a complete understanding of the permutations used to derive this, [23]. Seven (7) models (TPM 1, TPM 2, TPM 3, TPM 4, TPM 5, TPM 6 and TPM 7) with an adjusted R2 greater 75% were considered to be appropriate for practical purposes. The models are described in Table 1 below with the relevant computed statistics for , AICc and BIC for each of the model. Ranking the models in terms of goodness of fit (R2a), the TPM 1 ranked first followed by TPM 3, TPM 7, TPM 2, TPM 5, TPM 6 and TPM 4 respectively. The AICc further checks the goodness of fit of the model by penalising the addition of independent variables. Using this criterion, TPM 1 model has the lowest AICc value and indicates the most appropriate. Again, other models such as TPM 3, TPM 7, TPM 2, TPM 5, TPM 6 and TPM 4 ranked after TPM 1 using this criterion. The BIC statistics give the same rankings for the models. In this particular instance, all the statistical tests provide the same rankings for all the models. It is fairly correct to say that each of these selection criteria give a reasonable way for model selection. This confirms with previous research that AICc,
 
Table 1: Derived Models and Evaluation Statictics of Each Model
 
(R2a ) and BIC are appropriate techniques for the selection of regression models [29]. Therefore, it can be observed that based on the available data, each of these models is equally appropriate for practical use. However, if all measurements of pH, NH3-N and COD are available, TPM 1 is the most appropriate (for use).
 
These derived models provide a basis for estimating total phosphorus with few measurements. For instance, TPM 7 and TPM 5 can be easily estimated with measurements of pH and NH3-N. This is practically economical for water resources management and wastewater treatment monitoring. Also, the presence of pH, NH3-N and COD show a good estimation of TP as shown in TPM 1. For general restoration project involve in the determining TP, measurements of these three parameters may be adequate.
 
Evaluation
 
The models derived (Table 1) were tested with measurements from another bacterial technological experiments in Xuxi River, Zhejiang Province of China. The correlation of the plots of the observed data (TP) and the predicted values from (TPM 1, TPM 2, TPM 3, TPM 4, TPM 5, TPM 6 & TPM 7) are shown in figure 1 & 2. These illustrations indicate the linearity behaviour of the observed and predicted measurements. All plots show a relatively high correlation between the observed and predicted values.
 
Figure 2: Illustration of validation plots of (TPM 1, 2, 3 & 4) against TP
 
Figure 3: Illustration of validation plots of (TPM 5, 6 & 7) against TP
 
The derivation of the regression models in the field of bacterial technology of streams is relatively new. As a result it is very difficult to conduct comparative analysis of findings here with related research. Notwithstanding, a recent study on real-time water-quality monitoring using regression analysis to estimate nutrient and bacteria concentrations in Kansas streams, [10] found that the derived mathematical models produced R2 values of (0.509~0.964) for total phosphorus prediction. This estimation was based on independent variables i.e. turbidity, water temperature and specific conductance. In this research, the independent variables used are different. Also, the applied , which is a far better criterion than R2 [13] was used. The R2 values show a range of (0.793~0.858). Comparatively, the results presented in this research show better prediction capabilities.
 
In order to assess the predictive power of each model, additional statistics were introduced the PRESS and RMSE. It is imperative to test the predictive power of the model to test its reliability and practical relevance. The smallest PRESS value indicates the best predictive model. The results shown (Table 2) show that TPM 1 is the best choice. It also has the lowest RMSE of 0.167. In general, all other models show a relative RMSE of less than 20%. Measurements in water quality have typical errors in the range of 15-20% for most water quality variables, and sometimes higher (30-40%) for BOD [30,31]. In this case, the prediction error of less 20% justifies the probable use of all models for forecasting except TPM 3 (RMSE slightly >20%).
 
Table 2: Predictive capacities and evaluation of different models
 
The TPM 1 shows a better predictive power when checked against the AICc and BIC statistics (Table 2). Therefore, it is strongly recommended for predictive works under bacterial technology in urban streams and related studies. It is envisaged that for rapid reporting of preliminary results under the bacterial technology to stakeholders involve in such a project, these algorithms will be useful to achieve this goal. However, the difference between the performance of TPM 1 and the others is not significant in practical terms. As a result, depending on the work required or the available measured data for predictive works, each of these models has a good chance of providing reliable results as discussed.
 
Conclusion
 
Bacterial technology for restoration the water quality of urbanised polluted rivers promises to have appropriate use for the majority of polluted rivers in China. The past decade has seen some experimental and pilot projects through the use of this technology in parts of China. Related data on nutrient concentrations have been collated and collected during these projects. There is limited or non-existent work on mathematical models for forecasting and planning of biological treatment programs in urban streams or rivers. The models have shown a good performance against AICc, BIC and RMSE statistics. The recorded RMSE for all models was less than 20% and represent a comparative range with measurement errors (15-20%) associated with water quality variables. The TPM 1 shows the best choice of all the seven (7) models. However, the significant difference between the models in terms of practical relevance is small and hence, all models are equally good enough depending on the type and use of research. The derived models in this research envisage a contribution towards water quality monitoring and management of urban rivers and streams in China. It is also believed that, with urban rivers and streams in other parts of the world with similar conditions like the Class V standard grouping of the Chinese National Board, might be applicable to use the algorithms developed here.
 
 
References