手机qq群聊名字代码、qq群聊名字颜色代码-拍拖百科

本文选取2018年1月1日~1月31日QQ群聊数据

利用Python做文本可视化分析

首先要获取QQ群聊的文本数据

而爬虫可以很好的实现网页数据的爬取

一般爬取数据流程如下所示

分为网站请求、伪装、解析、存储四个过程

更为详细的爬取流程如下所示

需要添加一些规则

本文所使用的文本数据

是我从QQ电脑端后台导出的

导出文本数据后

编写程序，调试代码，做可视化分析

详细代码如下所示

#QQ群聊数据分析代码
import re
import datetime
import seaborn as sns
import ma as plt
import jieba
from wordcloud import WordCloud, STOPWORDS
from  import imread
# 日期
def get_date(data):
 # 日期
   dates = re.findall(r'\d{4}-\d{2}-\d{2}', data)
   # 天
   days = [date[-2:] for date in dates]
   (221)
   (days)
   ('Days')
   # 周几
   weekdays = [da(int(date[:4]), int(date[5:7]), int(date[-2:])).isocalendar()[-1]
               for date in dates]
   (222)
   (weekdays)
   ('WeekDays')
# 时间
def get_time(data):
   times = re.findall(r'\d{2}:\d{2}:\d{2}', data)
   # 小时
   hours = [time[:2] for time in times]
   (223)
   (hours, order=['06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17',
                               '18', '19', '20', '21', '22', '23', '00', '01', '02', '03', '04', '05'])
   ('Hours')
代码演示：
# 词云
def get_wordclound(text_data):
   word_list = [" ".join(sentence)) for sentence in text_data]
   new_text = ' '.join(word_list)
   pic_path = 'QQ.jpg'
   mang_mask = imread(pic_path)
   (224)
   wordcloud = WordCloud(background_color="white", font_path='/home/shen/Downloads/fon;,
                         mask=mang_mask, stopwords=STOPWORDS).generate(new_text)
   (wordcloud)
   ("off")
# 内容及词云
def get_content(data):
   pa = re.compile(r'\d{4}-\d{2}-\d{2}.*?\(\d+\)\n(.*?)\n\n', re.DOTALL)
   content = re.findall(pa, data)
   get_wordclound(content)
def run():
   filename = '新建文本文档.txt'
   with open(filename) as f:
       data = f.read()
   get_date(data)
   get_time(data)
   get_content(data)
   ()

做出文本可视话图后，可以得出如下结论

在2018年1月1日~1月31日统计180班群聊中

1月2日这一天群聊次数最多

每周的星期二群聊次数做多

每天的16时群聊次数最多

做词云图发现

“全体成员”出现的词频最多