At the dark age of my project, I needed multiple and parallel feature extraction from my dataset, then I found a proper scikit-learn tool which is FeatureUnion. This tool concatenates results of given multiple transformer objects. I should extract both Part-Of-Speech tag and punctuation features. Starting from this point, I decided to use FeatureUnion in my project. I figured out combination of each POS tag vector is like below
and of course I applied the same solution for punctuation vectors.
All code is about feature union is below
combined_features_pos = FeatureUnion([("noun", noun_vector), ("verb", verb_vector), ("adjective", adjective_vector), ("adverb", adverb_vector), ("pronoun", pronoun_vector), ("conjunction", conjunction_vector), ("number", number_vector)])
combined_features_punct = FeatureUnion([("comma", comma_vector), ("period", period_vector), ("colon", colon_vector), ("semicolon", semicolon_vector), ("question", question_mark_vector), ("exclamation", exclamation_mark_vector), ("triple_dot", triple_dot_vector)])
It’s not enough for me, I combine two combined features via FeatureUnion
combined_features = FeatureUnion([("pos", combined_features_pos), ("punct", combined_features_punct)])
Finally, here my last combined features.