Roman Trusov

One of the main ML problems is text classification, which is used, for example, to detect spam, define the topic of a news article, or choose the correct mining of a multi-valued word. The Statsbot team has already written how to train your own model for detecting spam emails, spam messages, and spam user comments. For this article, we asked a data scientist, Roman Trusov, to go deeper with machine learning text analysis.

You may know it’s impossible to define the best text classifier. In fields such as computer vision, there’s a strong consensus about a general way of designing models − deep networks with lots of residual connections. Unlike that, text classification is still far from convergence on some narrow area.

In this article, we’ll focus on the few main generalized approaches of text classifier algorithms and their use cases. Along with the high-level discussion, we offer a collection of hands-on tutorials and tools that can help with building your own models.


Stat and Bots