Parsing inflected non

2018-06-26 03:22:12

Taking an example from the Introduction to Latin Wikiversity, consider the sentence:

the sailor gives the girl money

We can handle this in Prolog with a DCG fairly elegantly with this pile of rules:

sentence(s(NP, VP)) --> noun_phrase(NP), verb_phrase(VP).
noun_phrase(Noun) --> det, noun(Noun).
noun_phrase(Noun) --> noun(Noun).
verb_phrase(vp(Verb, DO, IO)) --> verb(Verb), noun_phrase(IO), noun_phrase(DO).

det --> [the].
noun(X) --> [X], { member(X, [sailor, girl, money]) }.
verb(gives) --> [gives].

And we see that this works:

?- phrase(sentence(S), [the,sailor,gives,the,girl,money]).
S = s(sailor, vp(gives, money, girl)) ;

It seems to me that the DCG is really optimized for handling word-order languages. I'm at a complete loss as to how to handle this Latin sentence:

 nauta dat pecuniam puellae

This means the same thing (the sailor gives the girl money), but the word order is completely free: all of these permutations also mean exactly the same thing:

nauta dat puellae pecuniam
nauta puellae pecuniam dat
puellae pecuniam dat nauta
puellae pecuniam nauta dat
dat pecuniam nauta puellae

The first thing that occurs to me is to enumerate the permutations:

sentence(s(NP, VP)) --> noun_phrase(NP), verb_phrase(VP).
sentence(s(NP, VP)) --> verb_phrase(VP), noun_phrase(NP).

but this won't do, because while nauta belongs to the subject noun phrase, puellae which belongs to the object noun phrase is subordinate to the verb, but can precede it. I wonder if I should approach it by building some kind of attributed list first like so:

?- attributed([nauta,dat,pecuniam,puellae], Attributed)
Attributed = [noun(nauta,nom), verb(do,3,s), noun(pecunia,acc), noun(puella,dat)]

This seems like it will turn out to be necessary (and I don't see a good way to do it), but grammatically it's pushing food around on my plate. Maybe I could write a parser with some kind of horrifying non-DCG contraption like this:

parse(s(NounPhrase, VerbPhrase), Attributed) :-
  parse(subject_noun_phrase(NounPhrase, Attributed)),
  parse(verb_phrase(VerbPhrase, Attributed)).

parse(subject_noun_phrase(Noun), Attributed) :- 
  member(noun(Noun,nom), Attributed).

parse(object_noun_phrase(Noun), Attributed) :-
  member(noun(Noun,acc), Attributed)

This seems like it would work, but only as long as I have no recursion; as soon as I introduce a subordinate clause I'm going to reuse subjects in an unhealthy way.

I just don't see how to get from a non-word-order sentence to a parse tree. Is there a book that discusses this? Thanks.

Here I found a related resource (PERMUTATIONAL GRAMMAR FOR FREE WORD ORDER LANGUAGES). Seems worth to read (Hey, we all hated so much those mandatory Latin lessons, back in 60s !).

In appendix there is an implementation to test.

I forgot to point out Covington' free-word-order parser (it's just a sketch...) You can find in PRoNTo toolkit (I report here for sake of completeness, but I'm fairly sure you already know about it).

Seems like (drawing from my extremely rusty memory of high school Latin), your lexical analyzer needs to look at each token (word) and attribute each token with appropriate meta-data:

type of word (noun, verb, adjective, etc.)

For nouns, declension, gender, case and number

For verbs, conjugation, person, number, tense, voice and mood

For adjectives, gender, declension, number...

etc. (It's been a long time LOL).

Then your parse should be guided by the metadata, since that's what ties everything together.

你可以使用这个元语句：

unsorted([]) --> [].
unsorted([H|T]) -->
    H, unsorted(T).
unsorted([H|T]) -->
    unsorted(T), H.

sentence(s(NP, VP)) --> unsorted([noun_phrase(NP), verb_phrase(VP)]).

链接地址: http://www.djcxy.com/p/73124.html

上一篇: 如何窥视茉莉花中的自定义事件？

下一篇: 解析变形非