0% Complete
فارسی
Home
/
دومین کنفرانس ملی عصر انفجار تکنولوژی؛ هوش مصنوعی، تحولی در صنعت، تجارت و زنجیره تامین و دومین کنفرانس ملی علم داده در کاربردهای مهندسی
Document Clustering Using Deep Pre-trained Language Model Embeddings for Information Retrieval
Authors :
Mahdi Mohammadiha
1
Mohammad Hassan Sadreddini
2
Morteza Mohammadi Zanjireh
3
1- International University of Imam Khomeini
2- International University of Imam Khomeini
3- International University of Imam Khomeini
Keywords :
Document Clustering،Information Retrieval،SBERT،UMAP،HDBSCAN
Abstract :
Document clustering is critical to information retrieval (IR) as it enhances user navigation, semantic organization, and exploration of large text collections. Current clustering techniques, though, are marred by poor accuracy and semantic inconsistency, with many misclassifying relevant documents as noise and using superficial textual representations. This study aims to develop a clustering pipeline that produces semantically meaningful and structurally coherent groups of documents to support more effective IR. We propose a method that combines SBERT embeddings for deep semantic representation, UMAP for structure-preserving dimensionality reduction, and HDBSCAN for flexible, density-based clustering without needing to predefine the number of clusters. Experimental evaluations on the 20 Newsgroups dataset reveal that our optimal setting with the paraphrase-mpnet-base-v2 model obtains a Silhouette Score of 0.6853, ARI of 0.7865, and NMI of 0.8186. These results illustrate the promise of embedding-based clustering methods to greatly improve the interpretability and effectiveness of IR systems on real-world text collections.
Papers List
List of archived papers
تاثیر کیفیت گزارشگری مالی بر خطر سقوط قیمت سهام با تاکید بر سهامداران نهادی
محمد قرجی بنائی - اسماعیل زادمهر - محمدرضا حامدبابائی
Deep Learning and Fuzzy Entropy in Parkinson's Diagnosis: a Framework Based on Task-Based EEG Signals
Amir Hossein Tajarrod - Tania Hossein Khani - َAsghar Zarei - Mousa Shamsi
The Impact of an Interactive Rehabilitation Protocol on Reorganization of Brain Networks in Children with Cerebral Palsy: A Pilot Study
Shahed Salehzehi - Mahdi Mollaei - Parisa Hosseini - Ali Koohian Mohammad abadi - Mohammad Ebrahim Hashemi - Hamid Reza Kobravi - Narges Hashemi - Mehran Beiraghi Toosi - Javad Akhondian
Parkinson’s Disease Classification Using EEG and a Hybrid EEGNet–LSTM Architecture
Pouya Taghipour Langrodi - Amirsadra Khodadadi - Ali Sadat Modaresi - Mohammad Ahadzadeh - Mostafa Rostami - Sadegh Madadi
طراحی ربات نرم پوشیدنی مچ پا با کنترل پیشبین مدل برای توانبخشی پس از سکته
امیرحسین اختراعی طوسی - یگانه خراشادی زاده
Microfluidic Generation of Core-Shell Breast Tumor Spheroids for Evaluating Dose-Dependent Responses to Quercetin
Fatemeh Zarei - Mohammad Hashem Molayemat - Amir Shamloo - Mohammad Mehdi Sadeghian
Deep Neural Network–Based Adaptive Global Logarithmic Sliding Mode Control for Lower-Limb Rehabilitation Exoskeletons
Masoud Shirzadeh - Ghoncheh Zand - Samim Kamyab
Enhancing Dental Disease Detection: Leveraging Swin Transformer and DenseNet with Attention-Guided Fusion in Dental Panoramic Imaging
Mahdieh Dehghani - Reza Aghaeizadeh Zoroofi
Effect of ph changes on thermal and mechanical properties of polyacrylamide hydrogel using molecular dynamics simulation
Narges Karimzadeh Dehkordi
تاثیر بعد استراتژی مالی وبعد پاسخگویی برکیفیت خدمات درک شده و خشنودی مشتریان )مورد مطالعه : فروشگاه افق کوروش(
حسین بوذری
more
Samin Hamayesh - Version 43.6.0