Ali, MohamedAly, Adel AbdelSabour Ahmad2025-01-232025-01-232024Aly_washington_0250E_27716.pdfhttps://hdl.handle.net/1773/52763Thesis (Ph.D.)--University of Washington, 2024Artificial intelligence (AI) impresses us daily, outperforming humans in complex games and tasks. Yet, AI and Large Language Models (LLMs) stumble to grasp a language that is thousands of years old. It's Arabic, where subtle diacritical marks can completely alter a word's meaning. Top language models like ChatGPT and Google's Gemini face challenges with Arabic's unique features, potentially leading to critical misunderstandings. The main obstacle is adapting successful NLP systems from other languages to Arabic without understanding its distinct nuances. This dissertation presents a support system—a Multimodal Integration System—for diacritic-aware Classical Arabic language processing. The system integrates speech, text, and vision modalities to address the unique challenges of Arabic's rich linguistic features, such as diacritics and linguistic styles. Arabic has multiple correct linguistic styles, each preserving the same text but with different diacritics. These variations reflect regional dialects, adding meaning, and altering grammar and rhetoric. The Holy Quran, with its 20 linguistic styles based on a single core text, serves as our ideal dataset. We developed innovative databases and models that push the boundaries of Arabic language processing. Our scalable databases store texts in 7 different Arabic linguistic styles. QR-Vision excels at recognizing precise diacritics in images. QRDiaRec adds diacritics to un-diacritical text in various styles. QRSR and DASAM specialize in speech processing and alignment for Arabic diacritical text and voice. SemSim stands out as a dual-space similarity explorer, analyzing numeric and semantic data. Our methodology involves advanced techniques in data modeling, data quality validation, deep learning, computer vision, signal processing, and interactive data visualization.Our findings demonstrate improvements in addressing key challenges in Arabic Natural Language Processing (NLP). For text processing, we created an Automated Diacritization Deep Learning model. The model supports multiple Arabic diacritical styles, a unique feature in the field. Our best-performing model achieved a 94.2% accuracy rate in adding correct diacritics to Arabic text. In image processing, we built a specialized Optical Character Recognition (OCR) model for diacritic-aware Arabic text. Our OCR model reached an accuracy rate of 91.67% showing an improvement over the existing models. For speech processing, we developed two key systems. The first system, operating at the sentence level, combines our novel FuzTPI algorithm with machine learning models. This hybrid approach achieved up to 96% accuracy in audio segmentation and text-audio classification. Our second system focuses on word-level segmentation and alignment for Arabic diacritic-based speech. It achieved R² values of 0.959 for word start times and 0.957 for end times. These results show an improvement over existing Arabic speech recognition technologies. The dissertation is structured around these three modalities, with each section detailing the challenges, methodologies, and results achieved in processing Arabic with diacritics. They have far-reaching implications for applications such as machine translation, information retrieval, speech recognition, natural language understanding, educational technology, and the preservation of linguistic heritage. By addressing the unique challenges of Arabic diacritics across multiple modalities, this research paves the way for more nuanced and culturally sensitive AI applications in Arabic-speaking contexts.application/pdfen-USCC BYArtificial Intelligence (AI) ApplicationsDeep LearningDiacritic-Aware Arabic OCRMultimodal Integration SystemNatural Language ProcessingSpeech RecognitionComputer scienceArtificial intelligenceComputer science and systems - TacomaA Support System for Diacritic-aware Classical Arabic Language Processing: Integration of Speech, Text, and Vision ModalitiesThesis