小红书API终极指南：5分钟掌握Python自动化数据采集技巧-程序员充电站

小红书API终极指南：5分钟掌握Python自动化数据采集技巧

【免费下载链接】xhs基于小红书 Web 端进行的请求封装。https://reajason.github.io/xhs/项目地址: https://gitcode.com/gh_mirrors/xh/xhs

你是否曾想轻松获取小红书上的热门内容数据？或者想要分析竞品账号的运营策略？今天我要介绍的这个神奇工具——xhs Python库，将彻底改变你处理小红书数据的方式！这是一个基于小红书Web端请求封装的Python库，让你能够以编程方式访问小红书的各种数据接口。

场景引入：当数据采集遇到技术难题

想象一下这个场景：作为内容运营人员，你需要每天手动收集100个竞品账号的最新笔记数据，包括点赞数、评论数、收藏数，还要分析他们的发布时间规律。手动操作不仅耗时耗力，还容易出错。更糟糕的是，小红书的反爬机制越来越严格，普通爬虫很难稳定工作。

这就是xhs库诞生的背景！它通过模拟真实浏览器行为，绕过小红书的反爬机制，让你能够稳定、高效地获取所需数据。

核心概念：什么是xhs库？

xhs是一个专门为小红书设计的Python数据采集库，它封装了复杂的签名算法和请求逻辑，让你可以像调用普通API一样访问小红书的数据。

核心功能亮点：

🔐自动签名：处理复杂的x-s、x-t签名算法
🚀稳定可靠：内置重试机制和错误处理
📱多种登录方式：支持二维码登录和Cookie登录
📊丰富的数据接口：获取笔记、用户信息、搜索内容等

实战演练：从零开始使用xhs库

第一步：环境搭建

首先，我们需要安装必要的依赖：

# 安装xhs库 pip install xhs # 安装playwright用于浏览器模拟 pip install playwright # 安装浏览器环境 playwright install # 下载反检测脚本 curl -O https://cdn.jsdelivr.net/gh/requireCool/stealth.min.js/stealth.min.js

第二步：获取小红书Cookie

要使用xhs库，你需要获取小红书网站的Cookie。最简单的方法是：

使用浏览器登录小红书网页版
打开开发者工具（F12）
在控制台输入：document.cookie
复制输出的Cookie字符串

关键Cookie字段包括：

a1：用户身份标识
web_session：会话标识
webId：设备标识

第三步：编写第一个数据采集脚本

让我们创建一个简单的脚本来获取单篇笔记的详细信息：

import datetime import json from time import sleep from playwright.sync_api import sync_playwright from xhs import DataFetchError, XhsClient, help def sign(uri, data=None, a1="", web_session=""): """签名函数，用于生成请求签名""" for _ in range(10): try: with sync_playwright() as playwright: stealth_js_path = "stealth.min.js" # 你下载的文件路径 chromium = playwright.chromium browser = chromium.launch(headless=True) browser_context = browser.new_context() browser_context.add_init_script(path=stealth_js_path) context_page = browser_context.new_page() context_page.goto("https://www.xiaohongshu.com") browser_context.add_cookies([ {'name': 'a1', 'value': a1, 'domain': ".xiaohongshu.com", 'path': "/"} ]) context_page.reload() sleep(1) encrypt_params = context_page.evaluate( "([url, data]) => window._webmsxyw(url, data)", [uri, data] ) return { "x-s": encrypt_params["X-s"], "x-t": str(encrypt_params["X-t"]) } except Exception: pass raise Exception("签名失败") # 使用示例 if __name__ == '__main__': cookie = "你的小红书Cookie" xhs_client = XhsClient(cookie, sign=sign) # 获取笔记详情 note_id = "6505318c000000001f03c5a6" # 示例笔记ID note = xhs_client.get_note_by_id(note_id) # 提取图片URL image_urls = help.get_imgs_url_from_note(note) print(f"笔记标题：{note.get('title', '无标题')}") print(f"点赞数：{note.get('likes', 0)}") print(f"图片数量：{len(image_urls)}")

第四步：进阶功能探索

xhs库提供了丰富的功能接口，让我们看看几个实用的例子：

获取用户主页信息：

# 获取用户信息 user_info = xhs_client.get_user_info("用户ID") print(f"用户名：{user_info['nickname']}") print(f"粉丝数：{user_info['fans']}") print(f"获赞与收藏：{user_info['interactions']}")

搜索相关内容：

from xhs import SearchNoteType, SearchSortType # 搜索美食相关笔记 search_results = xhs_client.get_note_by_keyword( keyword="美食", page=1, page_size=20, sort=SearchSortType.GENERAL, note_type=SearchNoteType.VIDEO ) for note in search_results['items']: print(f"标题：{note['title']}") print(f"作者：{note['user']['nickname']}") print(f"点赞：{note['likes']}")

获取推荐流：

from xhs import FeedType # 获取穿搭类推荐内容 fashion_feed = xhs_client.get_home_feed( feed_type=FeedType.FASION, cursor="" ) for note in fashion_feed['items']: print(f"笔记ID：{note['id']}") print(f"内容：{note['desc'][:50]}...") # 显示前50个字符

进阶技巧：构建稳定可靠的数据采集系统

技巧一：多账号轮换策略

为了防止被封禁，建议使用多个账号进行轮换：

class MultiAccountXhsClient: def __init__(self, accounts): self.accounts = accounts self.current_index = 0 def get_client(self): account = self.accounts[self.current_index] self.current_index = (self.current_index + 1) % len(self.accounts) return XhsClient(account['cookie'], sign=account['sign_func']) def safe_request(self, func, *args, **kwargs): """安全请求，自动切换账号""" for _ in range(len(self.accounts)): try: client = self.get_client() return func(client, *args, **kwargs) except Exception as e: print(f"请求失败，切换账号：{e}") continue raise Exception("所有账号均请求失败")

技巧二：数据存储与处理

采集到的数据需要妥善存储和处理：

import pandas as pd import sqlite3 from datetime import datetime class DataManager: def __init__(self, db_path="xhs_data.db"): self.conn = sqlite3.connect(db_path) self.create_tables() def create_tables(self): """创建数据表""" self.conn.execute(''' CREATE TABLE IF NOT EXISTS notes ( id TEXT PRIMARY KEY, title TEXT, content TEXT, likes INTEGER, collects INTEGER, comments INTEGER, user_id TEXT, create_time TIMESTAMP, update_time TIMESTAMP ) ''') def save_note(self, note_data): """保存笔记数据""" self.conn.execute(''' INSERT OR REPLACE INTO notes VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?) ''', ( note_data['id'], note_data.get('title', ''), note_data.get('desc', ''), note_data.get('likes', 0), note_data.get('collects', 0), note_data.get('comments', 0), note_data.get('user', {}).get('user_id', ''), datetime.now(), datetime.now() )) self.conn.commit() def export_to_excel(self, output_path="xhs_data.xlsx"): """导出数据到Excel""" df = pd.read_sql_query("SELECT * FROM notes", self.conn) df.to_excel(output_path, index=False) print(f"数据已导出到：{output_path}")

技巧三：异常处理与监控

import logging from xhs.exception import DataFetchError, IPBlockError # 配置日志 logging.basicConfig( level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', handlers=[ logging.FileHandler('xhs_crawler.log'), logging.StreamHandler() ] ) def safe_crawl_note(client, note_id, max_retries=3): """安全爬取笔记，带重试机制""" for attempt in range(max_retries): try: note = client.get_note_by_id(note_id) logging.info(f"成功获取笔记：{note_id}") return note except DataFetchError as e: logging.warning(f"数据获取失败，第{attempt+1}次重试：{e}") time.sleep(2 ** attempt) # 指数退避 except IPBlockError as e: logging.error(f"IP被封禁：{e}") # 这里可以添加IP切换逻辑 break except Exception as e: logging.error(f"未知错误：{e}") break logging.error(f"获取笔记失败：{note_id}") return None

资源汇总：一站式学习路径

官方文档与示例

这个项目的文档结构非常清晰，提供了丰富的示例代码：

基础使用指南：docs/basic.rst - 包含安装、配置和基本用法
示例代码目录：example/ - 包含多个实用示例
核心源码：xhs/ - 库的核心实现代码

核心模块解析

让我们看看项目的主要模块结构：

xhs/core.py- 核心客户端类，包含所有API方法
xhs/help.py- 辅助函数，如图片URL提取、签名生成等
xhs/exception.py- 自定义异常类
xhs-api/app.py- Flask API服务实现

快速上手项目

如果你想快速体验，可以使用Docker一键启动：

# 启动API服务 docker run -it -d -p 5005:5005 reajason/xhs-api:latest

然后通过HTTP请求调用：

import requests # 调用签名服务 response = requests.post( "http://localhost:5005/sign", json={"uri": "/api/sns/web/v1/feed", "data": {}} ) signature = response.json()

实用场景：小红书数据采集的无限可能

场景一：竞品分析自动化

假设你运营一个美妆账号，需要监控竞品的表现：

def monitor_competitors(competitor_ids): """监控竞品账号""" competitors_data = [] for user_id in competitor_ids: try: # 获取用户信息 user_info = xhs_client.get_user_info(user_id) # 获取最新笔记 user_notes = xhs_client.get_notes_by_user(user_id, page=1) competitor_data = { 'user_id': user_id, 'nickname': user_info['nickname'], 'fans': user_info['fans'], 'latest_note_likes': user_notes[0]['likes'] if user_notes else 0, 'avg_likes': calculate_avg_likes(user_notes[:10]), 'update_time': datetime.now() } competitors_data.append(competitor_data) except Exception as e: print(f"监控用户 {user_id} 失败：{e}") return competitors_data

场景二：内容趋势分析

def analyze_content_trend(keywords, days=7): """分析内容趋势""" trend_data = {} for keyword in keywords: # 搜索相关笔记 results = xhs_client.get_note_by_keyword( keyword=keyword, page=1, page_size=50 ) # 分析数据 total_likes = sum(note['likes'] for note in results['items']) avg_likes = total_likes / len(results['items']) if results['items'] else 0 trend_data[keyword] = { 'total_notes': len(results['items']), 'total_likes': total_likes, 'avg_likes': avg_likes, 'top_note': max(results['items'], key=lambda x: x['likes']) if results['items'] else None } return trend_data

场景三：个人账号数据监控

class PersonalAccountMonitor: def __init__(self, xhs_client): self.client = xhs_client def daily_report(self): """生成日报""" report = { 'date': datetime.now().strftime('%Y-%m-%d'), 'notes_published': self.get_today_notes_count(), 'total_likes': self.get_today_likes(), 'new_followers': self.get_new_followers(), 'top_performing_note': self.get_top_note(), 'suggestions': self.generate_suggestions() } return report def get_today_notes_count(self): """获取今日发布笔记数""" # 实现逻辑 pass def get_today_likes(self): """获取今日总点赞数""" # 实现逻辑 pass

最佳实践与注意事项

合规使用建议

尊重平台规则：遵守小红书的用户协议和服务条款
合理频率请求：避免高频请求，建议设置合理的间隔时间
数据使用规范：仅用于个人学习和研究目的
隐私保护：不收集用户隐私数据

性能优化技巧

使用连接池：复用HTTP连接
缓存结果：对不常变的数据进行缓存
异步处理：使用异步IO提高并发性能
批量操作：尽量减少单个请求次数

错误处理策略

class XhsCrawler: def __init__(self): self.error_count = 0 self.success_count = 0 def crawl_with_retry(self, func, *args, max_retries=3, **kwargs): """带重试的爬取""" for i in range(max_retries): try: result = func(*args, **kwargs) self.success_count += 1 return result except Exception as e: self.error_count += 1 if i == max_retries - 1: raise wait_time = 2 ** i # 指数退避 print(f"第{i+1}次失败，等待{wait_time}秒后重试...") time.sleep(wait_time) def get_status(self): """获取爬虫状态""" total = self.success_count + self.error_count success_rate = self.success_count / total if total > 0 else 0 return { 'success': self.success_count, 'error': self.error_count, 'success_rate': f"{success_rate:.2%}" }

结语：开启小红书数据之旅

通过xhs库，你可以轻松实现小红书数据的自动化采集和分析。无论是个人创作者想要优化内容策略，还是企业需要监控竞品动态，这个工具都能为你提供强大的支持。

记住，技术只是手段，真正的价值在于如何利用数据做出更好的决策。合理使用工具，尊重平台规则，让数据为你的创作和运营赋能！

立即开始你的小红书数据探索之旅吧！从简单的笔记数据采集开始，逐步构建完整的数据分析体系，你会发现数据驱动的决策原来如此简单高效。

想要了解更多高级用法和最佳实践，记得查阅项目中的示例代码和文档，那里有更多宝藏等待你去发现！

【免费下载链接】xhs基于小红书 Web 端进行的请求封装。https://reajason.github.io/xhs/项目地址: https://gitcode.com/gh_mirrors/xh/xhs

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考

小红书API终极指南：5分钟掌握Python自动化数据采集技巧