随着全球对自然语言处理(NLP)专家的需求不断增长,行业对于这一领域专业人才的需求达到了前所未有的高度。预计在未来几年内,这种需求将呈指数级增长。然而,供应方面却远远跟不上。无论是新手还是有经验的专业人士,想要进入NLP领域都面临着挑战。其中最大的痛点之一就是缺乏系统的学习路径。
目前市面上有太多的资源涉及NLP概念,但大多数都是零散的。新手们往往会阅读大量文章和书籍,浏览各种博客和视频,最终却难以整合出一个完整的理解。这就是NLP学习路径发挥作用的地方!很高兴能提供一个全面且结构化的学习路径,帮助从零开始学习并掌握NLP。
结构——这是工作的核心。学习路径因其结构和全面性而受到欢迎。以下是如何分解NLP学习路径的每个月,以帮助规划学习旅程:
在这个月将学到什么?关键收获是什么?NLP旅程将如何进展?在每个月的开始都会提到这一点,以确保知道目前的位置,以及在那个特定月末将到达哪里。
每周应该平均花多少时间在这个部分。
这是将在那个月学习的NLP主题的顶级资源集合。包括文章、教程、视频、研究论文等类似资源。
如果正在寻找其他学习路径,等待结束了:
第0个月 - 先决条件(可选)
目标:这是给那些还不熟悉Python和数据科学的人准备的。到这个月末,应该对机器学习的构建模块和如何用Python编程有一个大致的了解。
# Python for Data Science: Course: Python for Data Science
# Python Cheat Sheet
# Learn Statistics: Descriptive Statistics by Khan Academy
# Data Preparation: Training and Testing: Split Data using sklearn
# Linear Regression: A Comprehensive Guide on Linear Regression
# Video on Linear Regression:
# Logistic Regression: Logistic Regression using Python
# Video on Logistic Regression:
# Decision Tree Algorithm: Tutorial on Tree-Based Algorithms
# Introduction to Decision Trees:
# K-fold Cross-Validation: Improve Your Model Performance using Cross-Validation (in Python and R)
# K-Fold Cross Validation Video:
# Singular Value Decomposition (SVD): SVD from Scratch
# SVD by Gilbert Strang:
第1个月 - 熟悉文本数据
目标:这个月是关于让熟悉基本的文本预处理技术。应该能够在本节结束时构建一个文本分类模型。
# Load Text Data from Multiple Sources: Pandas: How to Read and Write Files
# Learn to use Regular Expressions: Basics of Regular Expressions
# Speech and Language Processing by Stanford
# Text Preprocessing: spaCy library
# Tokenization using the spaCy library
# NLTK Library
# Stopword Removal and Text Normalization
# Exploratory Analysis of Text Data: A Complete Exploratory Data Analysis and Visualization for Text Data
# Extract Meta Features from Text: Traditional Methods for Text Data
# Project: Build a Text Classification model using Meta Features. You can use the dataset from the practice problem
# Identify the Sentiments
第2个月 - 计算语言学和词向量
目标:这个月将开始看到NLP的魔力。将学习如何利用英语语法从文本中提取关键信息。还将使用词向量,这是一种从文本中创建特征的高级技术。
# Extract Linguistic Features: Part-of-Speech Tagging using spaCy:
# Named Entity Recognition using spaCy:
# Dependency Parsing by Stanford:
# Text Representation in Vector Space: Bag of Words, TF-IDF and Word Embeddings
# Word Embeddings: Word Vector Representations (Word2Vec) by Stanford:
# Text Classification & Word Representations using FastText
# Tool: Gensim – Word2Vec
# Topic Modeling: Topic Modeling using Latent Semantic Analysis
# Beginner’s Guide to Topic Modeling in Python
# Topic Models: Information Extraction: Information Extraction using Python and spaCy
# Projects: Build Sentiment Detection Model using Word Embeddings. You can use the dataset from the practice problem
# Identify the Sentiments
# Categorize News Articles using Topic Modeling
第3个月 - NLP的深度学习复习
目标:深度学习是NLP最近发展和突破的核心。从谷歌的BERT到OpenAI的GPT-2,每个NLP爱好者至少应该基本了解深度学习是如何工作的,以驱动这些最先进的NLP框架。所以这个月,将专注于深度学习的概念、算法和工具。
# Neural Networks: Introductory Guide to Deep Learning and Neural Networks
# Optimization Algorithms: Optimization Algorithms for Deep Learning
# Recurrent Neural Networks (RNNs) and LSTM: A friendly introduction to RNNs:
# Recurrent Neural Networks Tutorial, Part 3 – Backpropagation Through Time and Vanishing Gradients
# Research Paper: Fundamentals of RNN and LSTM
# Introduction to PyTorch: A Beginner-Friendly Guide to PyTorch
# Course: Introduction to Deep Learning with PyTorch
第4个月 - NLP的深度学习模型
目标:现在已经尝到了深度学习的滋味,以及它在NLP背景下的应用,是时候提升一下了。深入研究像递归神经网络(RNNs)、长短期记忆(LSTM)等高级深度学习概念。这些将帮助掌握行业级的NLP用例。
# Recurrent Neural Networks (RNNs) for Text Classification: RNN – PyTorch
# Sentiment Analysis using LSTM
# Understanding Bidirectional RNN in PyTorch
# CNN Models for NLP: Understanding CNN for NLP
# Projects: Build a model to find named entities in the text using LSTM. You can get the dataset from here
第5个月 - 顺序建模
目标:在这个月,将学习使用处理序列作为输入和/或输出的顺序模型。在NLP中非常有用的概念,很快就会发现!
# Language Modeling: Language Models and RNNs by Stanford:
# A Comprehensive Guide to Build your own Language Model in Python!
# Text Generation with PyTorch
# Research Paper: Regularizing and Optimizing LSTM Language Models
# Book: Speech and Language Processing – N-gram Language Models
# Sequence-to-Sequence Modeling: PyTorch Seq2Seq
# Research Paper: Sequence to Sequence Learning with Neural Networks
# Seq2Seq with Attention
# Projects: Train a language model on Enron Email dataset to build an auto-completion system
# Build a Neural Machine Translation Model (English to any language of your choice)
第6个月 - NLP中的迁移学习
目标:迁移学习目前在NLP中非常流行。这实际上帮助民主化了之前遇到的最先进的NLP框架。这个月介绍了BERT、GPT-2、ULMFiT和Transformers。
# ULMFiT: Text Classification using ULMFiT in Python
# ULMFiT by FastAI:
# Transformers: Research Paper: Attention is all you Need
# How do Transformers Work in NLP?
# Pre-trained Large Language Models (BERT and GPT-2): Demystifying BERT
# Bert-As-a-Service
# Research Paper: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
# Tool: Transformers
# Fine-Tuning pre-trained Models: BERT Fine-Tuning Tutorial with PyTorch
# Chatbots: Rasa Masterclass:
# Learn how to Build and Deploy a Chatbot in Minutes using Rasa
# How to build a voice assistant with open source Rasa and Mozilla tools
# Audio Processing: Speech Data Exploration
# Audio Classification
# Pre-trained speech-to-text model – DeepSpeech
# Project: Build a chatbot with voice interface using Rasa