Algorithm comparison on a social platform

I used newspaper articles in my experiments until now. I decided to use texts which extracted from other platforms, so I collected texts from eksisozluk platform. Ekşisözlük is a kind of local Reddit. I tried to perform a comparison experiment by using Turkish gerunds as features.
Here my experiment components:

    Corpus: Eksisozluk dataset of 5 authors represented by nicknames, 100 texts for each author. Average word count is 461, 80% of the dataset is used as training data and 20% of the dataset is used as test data.
    Features: Features are Turkish gerunds. These words are derived from the verbs but used as nouns, adjectives, and adverbs in a sentence. I listed the most widely used verbs in Turkish, after that I derived gerunds by using gerund suffixes. Finally, I obtained 590 verbal nouns, 587 verbal adjectives and 916 verbal adverbs (with proper vowel versions).
    Algorithms: Algorithms are LinearSVM, Multi-Layer Perceptron (MLP), Naive Bayes (NB), k-Nearest Neighbor (kNN) and Decision Tree.

Now, the results are below.

    SVM


The performance of SVM with gerund frequencies as features is not satisfied, it classified just 3 of 5 authors with correct matching minimum 12 of 20 test documents.

    MLP


The performance of MLP with gerund features is slightly better than SVM. For example, it classified 4 of 5 authors with correct matching minimum 12 of 20 test documents.

    NB


The performance of NB is average and close to other results. For example, it classified 3 of 5 authors with correct matching minimum 12 of 20 test documents.

    Decision Tree


The performance of Decision tree is not enough, average F1-score is 0.39. It did not make satisfied correct matching.

    kNN


The performance of kNN not enough but slightly better than decision tree, average F1-score is 0.44. It classified only one of 5 authors with correct matching 16 of 20 test documents.

As a result, NB, kNN and decision tree are not suitable algorithms for this approach. SVM and MLP performed better than other algorithms.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s