Predicting Operational Lifetimes in Hybrid Perovskite Solar Cells: A Case Study of Machine Learning with Small Datasets
| dc.contributor.advisor | Hillhouse, Hugh W. | |
| dc.contributor.author | Sunkari, Preetham Paul | |
| dc.date.accessioned | 2025-08-01T22:17:46Z | |
| dc.date.available | 2025-08-01T22:17:46Z | |
| dc.date.issued | 2025-08-01 | |
| dc.date.submitted | 2025 | |
| dc.description | Thesis (Ph.D.)--University of Washington, 2025 | |
| dc.description.abstract | Mixed organic-inorganic halide perovskite solar cells (PSCs) have emerged as promising candidates for photovoltaic applications due to their exceptional optoelectronic properties, high defect tolerance, low cost, and ease of fabrication. However, their instability under environmental stress remains a major barrier to commercial viability. Recent studies have focused on understanding the degradation mechanisms in these solar cells under varied environmental conditions to develop predictive models for operational lifetimes.Given the limited understanding of these mechanisms, data-driven approaches—particularly those leveraging machine learning (ML) tools guided by domain knowledge—have become central to lifetime prediction. However, due to the significant time and cost associated with generating the requisite data on PSC degradation, available datasets are typically small, often containing only hundreds, or even just dozens, of meticulously gathered trials. This scarcity of data is a common challenge not only in PSC research but also in other experimental fields within chemistry and materials science, limiting the effectiveness of popular data-hungry ML techniques such as neural networks. In this work, I present a case study using in-house PSC degradation datasets to explore the challenges and strategies of modeling with small data. First, I outline criteria for identifying when a dataset falls within the small data regime or is considered too small for reliable predictive modeling. I then review recent ML advances tailored to such contexts and present a modeling workflow that includes feature construction, feature selection, assessment of metrics (such as sparsity level, false-discovery and ground-truth recovery), and uncertainty quantification. Using simulated datasets generated from known ground-truth variables, I demonstrate that prediction error alone is not always the most reliable metric for feature selection. These simulations, designed to reflect real-world scientific conditions, yield heuristic insights that are then applied to real PSC datasets. Overall, with the help of the in-house PSC degradation datasets, I present a practical framework to guide researchers in selecting appropriate machine learning methods based on data availability. This work aims to educate scientists and engineers on statistically rigorous modeling techniques for small datasets. | |
| dc.embargo.terms | Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Sunkari_washington_0250E_28412.pdf | |
| dc.identifier.uri | https://hdl.handle.net/1773/53447 | |
| dc.language.iso | en_US | |
| dc.rights | CC BY | |
| dc.subject | Feature Selection | |
| dc.subject | Machine Learning | |
| dc.subject | Perovskites | |
| dc.subject | Small data | |
| dc.subject | Solar Cells | |
| dc.subject | Chemical engineering | |
| dc.subject | Materials Science | |
| dc.subject | Statistics | |
| dc.subject.other | Chemical engineering | |
| dc.title | Predicting Operational Lifetimes in Hybrid Perovskite Solar Cells: A Case Study of Machine Learning with Small Datasets | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Sunkari_washington_0250E_28412.pdf
- Size:
- 8.82 MB
- Format:
- Adobe Portable Document Format
