top of page
  • Writer's pictureAQTech Power Prognostics

Anomaly Detection in Regulation Ring from Bulb Turbines using Machine Learning

Updated: Nov 23, 2023


Hydroelectric power stands as an enduring and globally prevalent renewable energy source. In today's energy landscape, marked by soaring demand, hydroelectric plants face the challenge of operating under intense conditions, particularly in the realm of Operation and Maintenance (O&M). Consequently, the imperative to minimize downtime and swiftly identify faults remains an ongoing priority. A promising avenue to enhance the efficiency and reliability of hydroelectric plants, addressing the concerns, lies in the adoption of advanced predictive maintenance strategies. These strategies aim to ensure the continuous operability of hydroelectric facilities, thereby optimizing their performance.

The occurrence of failures can lead to costly downtime and result in significant financial, material, and even human losses for the operating company. Hence, it is of paramount interest for energy companies to explore avenues that enable them to proactively prevent such failures, thereby shifting from traditional corrective and preventive maintenance to the more proactive approach of predictive maintenance.

In the context of the turbines under examination in this study, it's important to note that Bulb-type generating units typically consist primarily of various components, including buried parts, generators, water guides, main shafts, bearings, runners, and concrete elements. All of these crucial components are aligned along a horizontal shaft, making it a horizontal shaft turbine.

The wicket gate operation relies on hydraulic pistons that transmit motion to the regulation ring, connected to a lever system controlling the position of the guiding vanes. In the realm of hydroelectric power generation, the conversion of hydraulic energy from water flow into electricity is pivotal. However, when the flow interacts with the turbine, it can generate undesirable loads and vibrations throughout the entire generating unit, ultimately leading to fatigue and anomalies [1].

Through vibration monitoring of machines with adequate systems to perform it, it is possible to verify if there are anomalies in the vibration spectrum correlated with this failure mode within the machine. With proper analysis, valuable information can be obtained about the machine's health conditions.

The isolation forest method belongs to a broader family of machine learning techniques that focus on data representation and machine learning has demonstrated significant success in pushing the boundaries of various domains. In the electric power industry, the practical application of the isolation forest has been demonstrating remarkable advancements, as presented by [2], [3], [4], and [5].

Research involving supervised learning methods necessitates labelled data that distinguishes between the desired failure modes, i.e., data samples pre-classified as either 'healthy' or 'faulty,' which serves as the foundation for model training. However, for the majority of hydroelectric power plants that operate continuously and without interruption, gathering data encompassing all machine conditions, including normal and faulty states, presents a non-trivial challenge.

To contribute to the solution of this problem, in this work an unsupervised machine learning approach is proposed. This work aims to verify if an isolation forest model can automatically extract useful information from signals and detect anomalies in the Bulb-type turbine regulation ring.

2. Methodology

In this section, the methodology of this study will be presented.The proposed methodology is divided into five steps: data acquisition and processing, feature extraction, training of the machine learning model, anomaly detection, and validation of results.

2.1 Data acquisition

In this study, data from signal records of the regulating ring of the Generating Unit (GU) at the Santo Antônio Hydropower Plant located in Porto Velho, RO, Brazil, were utilized. The regulating ring is equipped with 2 proximeters, positioned radially at 120° (DAR-120) and 240° (DAR-240), measuring displacement at a sampling rate of 4kHz. The database contains records collected between 2020 and 2021. Figure 1 illustrates an example turbine with the regulating ring where the sensors are situated. In total, approximately 1168 signals were acquired for analysis.

Fig. 1. Regulation ring example with sensors.

2.2 Feature extraction

Feature extraction is a process that involves generating a new set of variables with improved discriminatory ability and reduced dimensionality from an initial set of physical variables. These features are extracted in two domains: time and frequency, owing to the pertinent information encapsulated within both domains. The 34 extracted features are delineated in Figure 2.

Fig. 2. Feature extraction process.

2.3 Machine Learning - Isolation Forest

The Isolation Forest algorithm's core principle lies in its capacity to isolate anomalies by creating isolation trees through random sub-sampling of the data. These trees exploit the fact that anomalies are typically fewer in number and exhibit different characteristics compared to normal data points. The algorithm constructs paths from the root of the tree to an external node for each data point, and anomalies are more likely to have shorter path lengths due to their distinctive nature [6]. By averaging the path lengths across multiple trees, a robust anomaly detection is computed.

During the model training step, a strategy involving data division into two distinct sets was employed: 80% for the training dataset and 20% for the test. This approach was implemented to ensure an impartial and bias-free evaluation of the model's performance. By allocating 80% of the data to the training set, exposure of the model to a substantial quantity of representative examples reflecting inherent data patterns was facilitated. This aids in the learning of underlying features and nuances associated with both normal and anomalous observations. Meanwhile, reserving 20% of the data for the test set provides an independent scenario for evaluating the model's generalization capabilities.

In order to assess the effectiveness of the suggested methodology in anomaly detection, evaluation metrics such as accuracy and precision were utilized. In this context, various metrics exist that elucidate both the accomplishments and shortcomings of a model when compared to the anticipated results [7].

  • True Positives (TP): Correct classification of the positive class;

  • False Negatives (FN): Error where the model predicted the negative class when the actual value was the positive class;

  • False Positives (FP): Error where the model predicted the positive class when the actual value was the negative class;

  • True Negatives (TN): Correct classification of the negative class.

Equation 1 illustrates the computation of the accuracy metric, which provides an overarching assessment of the model's performance. It gauges the number of correctly classified samples among the entire dataset.

  1. Accuracy = (TP + TN) / all samples

Equation 2 introduces the precision metric. This metric illustrates the accuracy of the model's positive class classifications, revealing the proportion of correct predictions among all the positive class identifications made by the model.

  1. Precision = TP / (TP + FP)

3. Results

The primary outcome of this study involves observing the model's accurate detections, specifically TP (true positives) and TN (true negatives), as depicted in Figure 3. Data labels were exclusively employed for validation, with the model never having accessed these labels at any juncture. This underscores the application of an unsupervised approach. From the model's performance, it becomes possible to extract accuracy metrics, culminating in a model accuracy of 93%. Transitioning to precision calculation, we utilized VP and FP values. As a result, the model's precision was computed, revealing a precision of 94 %.

Fig. 3. Confusion Matrix: TP, TN, FP and FN.

When analyzing the real detections recorded by the field technicians of the hydroelectric power plant in Figure 4, in conjunction with the predictions generated by the model, it is possible to observe moments when the model makes incorrect predictions. For example, false positives can be observed in July 2020 and January 2021. Other errors include false negatives, which can be observed in a few instances, such as in January and March 2020.

Fig. 4. Anomalies over time: Real detection and detection predicted by the model.

An accumulation of anomaly over time refers to the gradual buildup or increase in deviations, irregularities, or abnormalities on the regulation ring over an extended period. This means that as time progresses, discrepancies or deviations from the expected pattern or normal behavior become more pronounced or significant.

When examining the accumulations of anomalies generated from the actual values detected by field technicians and those predicted by the model, it is evident that over time, specifically where the model makes errors, discrepancies become apparent. Consequently, at the end of the accumulation curve, a difference of approximately 50 anomalies in this accumulation is observed. If a threshold for alarm or shutdown were defined, using the machine learning model with an isolation forest, it is likely that it would have triggered an alert earlier than the actual detection.

Fig. 5. Accumulation of anomalies over time: Actual accumulation and Accumulation predicted by the model.

4. Conclusion

In conclusion, this study has yielded significant insights into the performance of the isolation forest-based model for fault detection in the regulation ring. The model exhibited an accuracy rate of 93% and a precision of 94%, showcasing its effectiveness in correctly identifying anomalies. One notable finding is the concept of anomaly accumulation over time. As deviations and irregularities accumulate, they become more pronounced, and the model's errors contribute to differences in the cumulative anomaly count.

However, upon comparing the model's predictions with real detections by field technicians, it is evident that the model is not flawless. It occasionally generates false positives and false negatives, particularly noticeable during specific periods, such as July 2020 and January 2021. These discrepancies can be attributed to the inherent complexity of real-world data and the dynamic nature of the hydroelectric system.

Furthermore, this approach can be used as a tool to assist maintenance teams, facilitating their transition from reactive corrective maintenance to proactive preventive maintenance. Thus, this strategy has the potential to significantly strengthen the practice of predictive maintenance in the electricity sector, promoting operational efficiency and minimizing unplanned downtime.


1. Cutrim, T. H. P., & Lourenço, R. R. “Análise de fadiga de pinos de cisalhamento do anel de regulação de unidades hidrogeradoras”. 2011.

2. Hara, Y., Fukuyama, Y., Murakami, K., Iizaka, T., & Matsui, T. “Fault detection of hydroelectric generators using isolation forest”. In 2020 59th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE) (pp. 864-869). IEEE, 2022.

3. Ahmed, S., Lee, Y., Hyun, S. H., & Koo, I. “Unsupervised machine learning-based detection of covert data integrity assault in smart grid networks utilizing isolation forest”. IEEE Transactions on Information Forensics and Security, 14(10), 2765-2777, 2019.

4. Ribeiro, D., Matos, L. M., Moreira, G., Pilastri, A., & Cortez, P. “Isolation forests and deep autoencoders for industrial screw tightening anomaly detection”. Computers, 11(4), 54, 2022.

5. Hariri, S., Kind, M. C., & Brunner, R. J. Extended isolation forest. IEEE transactions on knowledge and data engineering, 33(4), 1479-1489, 2019.

6. Liu, F. T., Ting, K. M., & Zhou, Z. H. “Isolation Forest”. In 2008 eighth ieee international conference on data mining (pp. 413-422). IEEE, 2008.

7. Grandini, M., Bagli, E., & Visani, G. Metrics for multi-class classification: an overview. arXiv preprint arXiv:2008.05756, 2020.

The Authors

Y. Crotti professional in the field of technology. Holds a Master's degree in Information and Communication Technology from the Federal University of Santa Catarina, earned in 2020, as well as a Bachelor's degree in Computer Engineering from the same university, earned in 2018 Currently, he works as a Data Scientist and Machine Learning Developer at AQTech Power Prognostics in Brazil, where works with data analysis of hydrogenerators and wind generators for condition monitoring.

V. Pohlenz graduated in Electrical Engineering from the Federal University of Santa Catarina (UFSC) and is currently pursuing the Master's degree at same institution. He mainly uses Python and R programming languages, applying knowledge of probability and statistics in the electricity sector. He is currently a Data Scientist at AQTech Power Prognostics developing solutions based on Python and R technologies.

E.L. Nascimento graduated in electronic engineering from the Federal University of Pernambuco and a master's degree in electrical engineering from the same university, where he is studying AI and digital signal processing. He is currently the Condition Analysis Center (CAC) head at AQTech Power Prognostics, and an ISO Certified Level 1 Vibration Analyst, where works with condition analysis of hydrogenerators and wind generators.

M.H.N Nishioka graduated in mechanical engineering from the Federal University of Santa Catarina (UFSC, 2018. Mechanical Engineer and Vibration Analyst by AQTech from 07/2018 to present. It is focused on developing solutions for diagnosing rotating machines based on vibration analysis. Master's degree student at UFSC, area of Vibrations and Acoustics.

T.K Matsuo a master's degree in Mechatronics from IFSC (2017), a degree in Electrical Engineering from UFSC (2010) and an electronics technician (CEFET-SC, 2005). He has been working since 2006 in the development of technologies for monitoring and diagnosing rotating machines, mainly in the electrical sector. He is currently Technical Director at AQTech Power Prognostics, headquartered in Florianópolis.

F.S Borges graduated Electrical Engineering from the Federal University of Minas Gerais in 2013 and an MBA in Project Management from the Getulio Vargas Foundation in 2016. He has experience in the field of Electrical Energy Distribution and Generation. Currently, he works at Santo Antônio Energia, where he has been involved in the Maintenance of Generating Units since 2020.



bottom of page