| Makale Türü |
|
| Makale Alt Türü | SSCI, AHCI, SCI, SCI-Exp dergilerinde yayınlanan tam makale |
| Dergi Adı | Ksii Transactions on Internet and Information Systems |
| Dergi ISSN | 1976-7277 Wos Dergi Scopus Dergi |
| Dergi Tarandığı Indeksler | SCI-Expanded |
| Makale Dili | Türkçe |
| Basım Tarihi | 03-2024 |
| Cilt No | 18 |
| Sayı | 3 |
| Sayfalar | 591 / 609 |
| DOI Numarası | 10.3837/tiis.2024.03.004 |
| Makale Linki | http://dx.doi.org/10.3837/tiis.2024.03.004 |
| Özet |
| In this study, preprocessings with all combinations were examined in terms of the effects on decreasing word number, shortening the duration of the process and the classification success in balanced and imbalanced datasets which were unbalanced in different ratios. The decreases in the word number and the processing time provided by preprocessings were interrelated. It was seen that more successful classifications were made with Turkish datasets and English datasets were affected more from the situation of whether the dataset is balanced or not. It was found out that the incorrect classifications, which are in the classes having few documents in highly imbalanced datasets, were made by assigning to the class close to the related class in terms of topic in Turkish datasets and to the class which have many documents in English datasets. In terms of average scores, the highest classification was obtained in Turkish datasets as follows: with not applying lowercase, applying stemming and removing stop words, and in English datasets as follows: with applying lowercase and stemming, removing stop words. Applying stemming was the most important preprocessing method which increases the success in Turkish datasets, whereas removing stop words in English datasets. The maximum scores revealed that feature selection, feature size and classifier are more effective than preprocessing in classification success. It was concluded that preprocessing is necessary for text classification because it shortens the processing time and can achieve high classification success, a preprocessing method does not have the same effect in all languages, and different preprocessing methods are more successful for different languages. |
| Anahtar Kelimeler |
| Natural Language Processing | Pattern Recognition | Preprocessing | Text Classification | Text Mining. |
| Dergi Adı | KSII Transactions on Internet and Information Systems |
| Yayıncı | Korean Society for Internet Information |
| Açık Erişim | Hayır |
| ISSN | 1976-7277 |
| E-ISSN | 1976-7277 |
| CiteScore | 2,7 |
| SJR | 0,283 |
| SNIP | 0,380 |