Finding path for corpus in NLTK

I am using the Natural Language Toolkit for python to write a program. In it I am trying to load a corpus of my own files. To do that I am using code to the following effect:

from nltk.corpus import PlaintextCorpusReader
corpus_root=(insert filepath here)
wordlists=PlaintextCorpusReader(corpus_root, '.*')

Let's say my file is called reader.py and my corpus of files is located in a directory called 'corpus' in the same directory as reader.py. I would like to know a way to generalize finding the filepath above, so that my code could find the path for the 'corpus' directory for any location for anyone using the code. I have tried these posts, but they only allow me to get absolute file paths: Find current directory and file's directory

Any help would be greatly appreciated!


From what I understand

  • Your reader.py file and corpus directory are always in the same directory
  • You're looking for a way to refer to corpus from reader.py regardless of where you put them in your directory structure
  • In that case, the question that you referred to seems to be what you need. Another way of doing it is in this other answer. Using that second option, your code would then be:

    from nltk.corpus import PlaintextCorpusReader
    import os.path
    import sys
    
    basepath = os.path.dirname(__file__)
    corpus_root= os.path.abspath(os.path.join(basepath, "corpus"))
    wordlists=PlaintextCorpusReader(corpus_root, '.*')
    

    Keep in mind that while an absolute path is created, it is created based on the information obtained in the basepath = os.path.dirname(__file__) bit above, which yields reader.py 's current directory. Have a look at the documentation for some official documentation.

    链接地址: http://www.djcxy.com/p/54718.html

    上一篇: 使用python脚本作为命令获取当前路径

    下一篇: 在NLTK中找到语料库的路径