自然语言处理学习路径指南

随着全球对自然语言处理NLP)专家的需求不断增长,行业对于这一领域专业人才的需求达到了前所未有的高度。预计在未来几年内,这种需求将呈指数级增长。然而,供应方面却远远跟不上。无论是新手还是有经验的专业人士,想要进入NLP领域都面临着挑战。其中最大的痛点之一就是缺乏系统的学习路径

目前市面上有太多的资源涉及NLP概念,但大多数都是零散的。新手们往往会阅读大量文章和书籍,浏览各种博客和视频,最终却难以整合出一个完整的理解。这就是NLP学习路径发挥作用的地方!很高兴能提供一个全面且结构化的学习路径,帮助从零开始学习并掌握NLP。

NLP学习路径框架

结构——这是工作的核心。学习路径因其结构和全面性而受到欢迎。以下是如何分解NLP学习路径的每个月,以帮助规划学习旅程:

在这个月将学到什么?关键收获是什么?NLP旅程将如何进展?在每个月的开始都会提到这一点,以确保知道目前的位置,以及在那个特定月末将到达哪里。

每周应该平均花多少时间在这个部分。

这是将在那个月学习的NLP主题的顶级资源集合。包括文章、教程、视频、研究论文等类似资源。

其他数据科学学习路径

如果正在寻找其他学习路径,等待结束了:

  • 成为数据科学家并掌握2020年机器学习的学习路径
  • 掌握2020年深度学习的学习路径
  • 计算机视觉学习路径(1月9日启动)

深入学习

第0个月 - 先决条件(可选)

目标:这是给那些还不熟悉Python和数据科学的人准备的。到这个月末,应该对机器学习的构建模块和如何用Python编程有一个大致的了解。

# Python for Data Science: Course: Python for Data Science # Python Cheat Sheet # Learn Statistics: Descriptive Statistics by Khan Academy # Data Preparation: Training and Testing: Split Data using sklearn # Linear Regression: A Comprehensive Guide on Linear Regression # Video on Linear Regression: # Logistic Regression: Logistic Regression using Python # Video on Logistic Regression: # Decision Tree Algorithm: Tutorial on Tree-Based Algorithms # Introduction to Decision Trees: # K-fold Cross-Validation: Improve Your Model Performance using Cross-Validation (in Python and R) # K-Fold Cross Validation Video: # Singular Value Decomposition (SVD): SVD from Scratch # SVD by Gilbert Strang:

第1个月 - 熟悉文本数据

目标:这个月是关于让熟悉基本的文本预处理技术。应该能够在本节结束时构建一个文本分类模型。

# Load Text Data from Multiple Sources: Pandas: How to Read and Write Files # Learn to use Regular Expressions: Basics of Regular Expressions # Speech and Language Processing by Stanford # Text Preprocessing: spaCy library # Tokenization using the spaCy library # NLTK Library # Stopword Removal and Text Normalization # Exploratory Analysis of Text Data: A Complete Exploratory Data Analysis and Visualization for Text Data # Extract Meta Features from Text: Traditional Methods for Text Data # Project: Build a Text Classification model using Meta Features. You can use the dataset from the practice problem # Identify the Sentiments

第2个月 - 计算语言学和词向量

目标:这个月将开始看到NLP的魔力。将学习如何利用英语语法从文本中提取关键信息。还将使用词向量,这是一种从文本中创建特征的高级技术。

# Extract Linguistic Features: Part-of-Speech Tagging using spaCy: # Named Entity Recognition using spaCy: # Dependency Parsing by Stanford: # Text Representation in Vector Space: Bag of Words, TF-IDF and Word Embeddings # Word Embeddings: Word Vector Representations (Word2Vec) by Stanford: # Text Classification & Word Representations using FastText # Tool: Gensim – Word2Vec # Topic Modeling: Topic Modeling using Latent Semantic Analysis # Beginner’s Guide to Topic Modeling in Python # Topic Models: Information Extraction: Information Extraction using Python and spaCy # Projects: Build Sentiment Detection Model using Word Embeddings. You can use the dataset from the practice problem # Identify the Sentiments # Categorize News Articles using Topic Modeling

第3个月 - NLP的深度学习复习

目标:深度学习是NLP最近发展和突破的核心。从谷歌的BERT到OpenAI的GPT-2,每个NLP爱好者至少应该基本了解深度学习是如何工作的,以驱动这些最先进的NLP框架。所以这个月,将专注于深度学习的概念、算法和工具。

# Neural Networks: Introductory Guide to Deep Learning and Neural Networks # Optimization Algorithms: Optimization Algorithms for Deep Learning # Recurrent Neural Networks (RNNs) and LSTM: A friendly introduction to RNNs: # Recurrent Neural Networks Tutorial, Part 3 – Backpropagation Through Time and Vanishing Gradients # Research Paper: Fundamentals of RNN and LSTM # Introduction to PyTorch: A Beginner-Friendly Guide to PyTorch # Course: Introduction to Deep Learning with PyTorch

第4个月 - NLP的深度学习模型

目标:现在已经尝到了深度学习的滋味,以及它在NLP背景下的应用,是时候提升一下了。深入研究像递归神经网络(RNNs)、长短期记忆(LSTM)等高级深度学习概念。这些将帮助掌握行业级的NLP用例。

# Recurrent Neural Networks (RNNs) for Text Classification: RNN – PyTorch # Sentiment Analysis using LSTM # Understanding Bidirectional RNN in PyTorch # CNN Models for NLP: Understanding CNN for NLP # Projects: Build a model to find named entities in the text using LSTM. You can get the dataset from here

第5个月 - 顺序建模

目标:在这个月,将学习使用处理序列作为输入和/或输出的顺序模型。在NLP中非常有用的概念,很快就会发现!

# Language Modeling: Language Models and RNNs by Stanford: # A Comprehensive Guide to Build your own Language Model in Python! # Text Generation with PyTorch # Research Paper: Regularizing and Optimizing LSTM Language Models # Book: Speech and Language Processing – N-gram Language Models # Sequence-to-Sequence Modeling: PyTorch Seq2Seq # Research Paper: Sequence to Sequence Learning with Neural Networks # Seq2Seq with Attention # Projects: Train a language model on Enron Email dataset to build an auto-completion system # Build a Neural Machine Translation Model (English to any language of your choice)

第6个月 - NLP中的迁移学习

目标:迁移学习目前在NLP中非常流行。这实际上帮助民主化了之前遇到的最先进的NLP框架。这个月介绍了BERT、GPT-2、ULMFiT和Transformers。

# ULMFiT: Text Classification using ULMFiT in Python # ULMFiT by FastAI: # Transformers: Research Paper: Attention is all you Need # How do Transformers Work in NLP? # Pre-trained Large Language Models (BERT and GPT-2): Demystifying BERT # Bert-As-a-Service # Research Paper: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding # Tool: Transformers # Fine-Tuning pre-trained Models: BERT Fine-Tuning Tutorial with PyTorch # Chatbots: Rasa Masterclass: # Learn how to Build and Deploy a Chatbot in Minutes using Rasa # How to build a voice assistant with open source Rasa and Mozilla tools # Audio Processing: Speech Data Exploration # Audio Classification # Pre-trained speech-to-text model – DeepSpeech # Project: Build a chatbot with voice interface using Rasa

NLP学习路径信息图表

  • 它们帮助可视化将如何学习不同主题的结构
  • 它们可以用作清单,当在NLP旅程中取得进展时,可以勾选概念
沪ICP备2024098111号-1
上海秋旦网络科技中心:上海市奉贤区金大公路8218号1幢 联系电话:17898875485