Finding path for corpus in NLTK
I am using the Natural Language Toolkit for python to write a program. In it I am trying to load a corpus of my own files. To do that I am using code to the following effect:
from nltk.corpus import PlaintextCorpusReader
corpus_root=(insert filepath here)
wordlists=PlaintextCorpusReader(corpus_root, '.*')
Let's say my file is called reader.py and my corpus of files is located in a directory called 'corpus' in the same directory as reader.py. I would like to know a way to generalize finding the filepath above, so that my code could find the path for the 'corpus' directory for any location for anyone using the code. I have tried these posts, but they only allow me to get absolute file paths: Find current directory and file's directory
Any help would be greatly appreciated!
From what I understand
reader.py
file and corpus
directory are always in the same directory corpus
from reader.py
regardless of where you put them in your directory structure In that case, the question that you referred to seems to be what you need. Another way of doing it is in this other answer. Using that second option, your code would then be:
from nltk.corpus import PlaintextCorpusReader
import os.path
import sys
basepath = os.path.dirname(__file__)
corpus_root= os.path.abspath(os.path.join(basepath, "corpus"))
wordlists=PlaintextCorpusReader(corpus_root, '.*')
Keep in mind that while an absolute path is created, it is created based on the information obtained in the basepath = os.path.dirname(__file__)
bit above, which yields reader.py
's current directory. Have a look at the documentation for some official documentation.
上一篇: 使用python脚本作为命令获取当前路径
下一篇: 在NLTK中找到语料库的路径