基于M IME邮件结构的邮件内容提取技术的研究

胡燕,滕桂法,董素芬,王聃

(河北农业大学信息科学与技术学院,保定,071001)

摘 要:为准确提取电子邮件的内容,对邮件的组成结构进行详尽的分析,归纳出邮件正文特征,并设计出一个基于MME邮件结构的邮件预处理系统。该系统采用分块处理和特征识别的方法,克服电子邮件不规范的缺点,并对邮件正文中的回复行和广告行进行过滤,从而实现对邮件内容快速准确提取。

关键词:多用途互联网邮件扩展;电子邮件;预处理

Research on Extracting E-mail Information Based on Structure of MIME Mail

Hu Yan, Teng Guifa, Dong Sufen, Wang Dan

(School of Information Science and Technology, Agricultural University of Hebei, Banding 071001China)

Abstract: In order to accurately extract the information of E-mail, E-mail' s structure and content features are analyzed, and an E-mail pretreatment system based on structure of MIME mail is designed Using block-treatment and feature identification methods, this system overcomes the short comings of informal style and filteres reply lines and advertising lines. The system finally realizes expectative goal of extracting E-mail information quickly and accurately.

Keywords: MIME; E-mail; Pretreatment

 

现代图书情报技术, 总第164,2008.5,85-88