1

I have to do a parsing: the goal is to create a grammar rules that will be applied in a corpus. I have a question: is it possible to have a list within a grammar?

Example:

1) Open the text to be analyzed
2) Write the grammatical rules (just an example):
   grammar("""
   S -> NP VP
   NP -> DET N
   VP -> V N
   DET -> list_det.txt
   N -> list_n.txt
   V -> list.txt""")
3) Print the result with the entries that obey this grammar

It's possible?

3
  • Should be possible. What are you doubting? Commented Aug 31, 2017 at 13:54
  • I don't know how to call an external list inside a grammar. I also have doubts if it is possible since we are talking about the lexicon... Commented Aug 31, 2017 at 14:03
  • Are "list_det.txt", "list_n.txt", and "list.txt" the names of files whose contents should be included into the grammar where their names currently appear? Commented Sep 2, 2017 at 0:58

1 Answer 1

2

Here is a quick conceptual prototype of your grammar, using pyparsing. I could not tell from your question what the contents of the N, V, and DET lists could be, so I just arbitrarily chose words composed of 'n's and 'v's, and the literal 'det'. You can replace the <<= assignments with the correct expressions for your grammar, but this parser and the sample string should show that your grammar is at least feasible. (If you edit your question to show what N, V, and DET are lists of, I can update this answer with less arbitrary expressions and sample. Also including a sample string to be parsed would be useful.)

I also added some grouping so that you could see how the structure of the grammar is reflected in the structure of the results. You can leave this in or removve it and the parser will still work.

import pyparsing as pp

v = pp.Forward()
n = pp.Forward()
det = pp.Forward()

V = pp.Group(pp.OneOrMore(v))
N = pp.Group(pp.OneOrMore(n))
DET = pp.Group(pp.OneOrMore(det))

VP = pp.Group(V + N)
NP = pp.Group(DET + N)
S = NP + VP

# replace these with something meaningful
v <<= pp.Word('v')
n <<= pp.Word('n')
det <<= pp.Literal('det')

sample = 'det det nn nn nn nn vv vv vv nn nn nn nn'

parsed = S.parseString(sample)
print(parsed.asList())

Prints:

[[['det', 'det'], ['nn', 'nn', 'nn', 'nn']], 
 [['vv', 'vv', 'vv'], ['nn', 'nn', 'nn', 'nn']]]

EDIT:

I guessed the "NP" and "VP" are "noun phrase" and "verb phrase", but I don't know what "DET" could be. Still, I made up a less abstract example. I also expanded the lists to accept more grammatical forms of lists of nouns and verbs, with connecting 'and's and commas.

import pyparsing as pp

v = pp.Forward()
n = pp.Forward()
det = pp.Forward()

def collectionOf(expr):
    '''
    Compose a collection expression for a base expression that matches
        expr
        expr and expr
        expr, expr, expr, and expr
    '''
    AND = pp.Literal('and')
    OR = pp.Literal('or')
    COMMA = pp.Suppress(',')
    return expr + pp.Optional(
            pp.Optional(pp.OneOrMore(COMMA + expr) + COMMA) + (AND | OR) + expr)

V = pp.Group(collectionOf(v))('V')
N = pp.Group(collectionOf(n))('N')
DET = pp.Group(pp.OneOrMore(det))('DET')

VP = pp.Group(V + N)('VP')
NP = pp.Group(DET + N)('NP')
S = pp.Group(NP + VP)('S')

# replace these with something meaningful
v <<= pp.Combine(pp.oneOf('chase love hate like eat drink') + pp.Optional(pp.Literal('s')))
n <<= pp.Optional(pp.oneOf('the a my your our his her their')) + pp.oneOf("dog cat horse rabbit squirrel food water")
det <<= pp.Optional(pp.oneOf('why how when where')) +pp.oneOf( 'do does did')

samples = '''
    when does the dog eat the food
    does the dog like the cat
    do the horse, cat, and dog like or hate their food
    do the horse and dog love the cat
    why did the dog chase the squirrel
'''
S.runTests(samples)

Prints:

when does the dog eat the food
[[[['when', 'does'], ['the', 'dog']], [['eat'], ['the', 'food']]]]
- S: [[['when', 'does'], ['the', 'dog']], [['eat'], ['the', 'food']]]
  - NP: [['when', 'does'], ['the', 'dog']]
    - DET: ['when', 'does']
    - N: ['the', 'dog']
  - VP: [['eat'], ['the', 'food']]
    - N: ['the', 'food']
    - V: ['eat']


does the dog like the cat
[[[['does'], ['the', 'dog']], [['like'], ['the', 'cat']]]]
- S: [[['does'], ['the', 'dog']], [['like'], ['the', 'cat']]]
  - NP: [['does'], ['the', 'dog']]
    - DET: ['does']
    - N: ['the', 'dog']
  - VP: [['like'], ['the', 'cat']]
    - N: ['the', 'cat']
    - V: ['like']


do the horse, cat, and dog like or hate their food
[[[['do'], ['the', 'horse', 'cat', 'and', 'dog']], [['like', 'or', 'hate'], ['their', 'food']]]]
- S: [[['do'], ['the', 'horse', 'cat', 'and', 'dog']], [['like', 'or', 'hate'], ['their', 'food']]]
  - NP: [['do'], ['the', 'horse', 'cat', 'and', 'dog']]
    - DET: ['do']
    - N: ['the', 'horse', 'cat', 'and', 'dog']
  - VP: [['like', 'or', 'hate'], ['their', 'food']]
    - N: ['their', 'food']
    - V: ['like', 'or', 'hate']


do the horse and dog love the cat
[[[['do'], ['the', 'horse', 'and', 'dog']], [['love'], ['the', 'cat']]]]
- S: [[['do'], ['the', 'horse', 'and', 'dog']], [['love'], ['the', 'cat']]]
  - NP: [['do'], ['the', 'horse', 'and', 'dog']]
    - DET: ['do']
    - N: ['the', 'horse', 'and', 'dog']
  - VP: [['love'], ['the', 'cat']]
    - N: ['the', 'cat']
    - V: ['love']


why did the dog chase the squirrel
[[[['why', 'did'], ['the', 'dog']], [['chase'], ['the', 'squirrel']]]]
- S: [[['why', 'did'], ['the', 'dog']], [['chase'], ['the', 'squirrel']]]
  - NP: [['why', 'did'], ['the', 'dog']]
    - DET: ['why', 'did']
    - N: ['the', 'dog']
  - VP: [['chase'], ['the', 'squirrel']]
    - N: ['the', 'squirrel']
    - V: ['chase']
Sign up to request clarification or add additional context in comments.

3 Comments

DET is determiner, a word class that includes articles, demonstratives, and quantifiers (among other things).
Great!! Problem is: V DET and N are lists with many many entries, so it is difficult to put into a program. I need to to call these lists and I don't know if it's possible...
This is about as far as pyparsing will reasonably go - you will quickly start to have to deal with verb tenses, irregular plurals, etc., and then it gets really ugly. At this point, you are really doing NLTK stuff, and thru Google I'm sure you will find some Python NL libs. I hope this at least demonstrates that your grammar is workable.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.