How to extract paragraphs from text/pdf using nltk?

I am wanting to extract paragraphs from a big text file , basic idea is to extract each section of a pdf , I know the following : Each section begins with a number like 7.1 , 7.2 etc , So I want to extract all the text before 7.2 that would belong to 7.1 , similarly if I extract all the text before first occurrence of world 7.3 , and subtract 7-1 , it would give me 7.2 . So is there a way to do that in nltk ?

链接地址: http://www.djcxy.com/p/65166.html

上一篇: 从NLTK中的句子中提取关系

下一篇: 如何使用nltk从text / pdf中提取段落?