Resources
Resource | Description | Tool functionality |
---|---|---|
Omnia Russica |
Omnia Russica (lat. all Russian) is an open source corpus project, containing 33 billion words. Omnia Russica is combining major Russian corpus sources within one pipeline
|
|
Slovnet |
SlovNet is a Python library for deep-learning based NLP modeling for Russian language. Library is integrated with other Natasha projects: large NER corpus and compact Russian embeddings. |
|
DeepPavlov |
DeepPavlov is "a conversational artificial intelligence framework that contains all the components required for building dialogue systems"[1]. It utilizes recent advances in deep bidirectional transformer models. If you've heard of BERT, this is RuBERT. There is also SlavicBERT for Bulgarian, Czech, Polish, and Russian. |
NER |
Dostoevsky |
A sentiment analysis library for Python. It takes a single word or larger text as input and will return a sentiment classification of positive, negative or neutral. Dostoevsky's model was trained the RuSentiment dataset of more than 30,000 comments in VKontakte. VK has requested that the original dataset be taken down temporarily, but Dostoevsky allows you to download the trained vectors. Example: |
sentiment analysis |
Natasha |
Natasha provides rule-based named-entity recognition for Python. It's been trained to recognize the form of Russian names (e.g. nicknames, patronymics), although it can make some improbable guesses. In addition to names, it can recognize dates and money. |
NER |