Text mining with python
Mentor
GK

Gábor Kismihók

Beschreibung

This learning path provides a comprehensive introduction to text mining techniques, focusing on practical implementation using Python. It covers essential steps from text preprocessing, such as lower case conversion, punctuation removal, stopword elimination, tokenization, stemming, and lemmatization, to advanced topics like Bag-of-Words, TF-IDF, Part-of-Speech tagging, Word2Vec, Doc2Vec, sentiment analysis, Latent Semantic Analysis, and Latent Dirichlet Allocation. Learners will gain hands-on experience with various text mining concepts and their application in Python.

Lernziele

  • Understand the fundamental concepts and applications of text mining.

  • Apply various text preprocessing techniques, including lowercasing, punctuation removal, stopword elimination, tokenization, stemming, and lemmatization, using Python.

  • Implement and utilize Bag-of-Words and TF-IDF models for text representation.

  • Perform Part-of-Speech tagging on text data with NLTK.

  • Understand and apply word embedding techniques like Word2Vec and Doc2Vec.

  • Conduct sentiment analysis on text data using Python.

  • Explore and implement topic modeling techniques such as Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA).

18 Module

Inklusive

21.03.2026

Aktualisiert

-

Benötigte Zeit (Stunde)

1. Overview of text mining
2. Lower case conversion, remove punctuation and stopwords, text tokenization in python
3. Stemming and lemmatization
4. Stemming and lemmatization in python
5. Bag of word
6. Bag of word in python
7. Tf-idf
8. Tf-idf in python
9. Part of speech tagging
10. Part of speech tagging with nltk
11. Word2vec
12. Word2vec in python
13. Doc2vec
14. Sentiment analysis in python
15. Latent semantic analysis
16. Latent semantic analysis in python
17. Latent dirichlet allocation
18. Latent dirichlet allocation in python