Dependency Parsing

A dependency structure shows which words depend on other words in a sentence. The below diagram shows the dependency structure for the sentence: "Bills on ports and immigration were submitted by Senator Brownback, Republican of Kansas" (source)

dependency_structure

Each arrow starts from the head and goes towards the dependent. Sometimes a ROOT is added as the head of the tree so that every word is dependent on one head.

To start off, we can look at a simpler example using the sentence "I ate fish".

i_ate_fish

The first (green) arrow starts from ate (head) and goes towards I (dependent). The subject of ate is I. This is a left arc.

The next arrow starts from ate (head) and goes towards fish (dependent). The object of ate is fish. This is a right arc.

The method of dependency parsing we're using is called transition-based parsing. It's transition based because at each step of parsing, there is a transition that can be applied. Using the I ate fish example again, please see the below diagram (source). There are two columns. The column on the right represents the "stack", where we look at the words that can be parsed. The right column represents the "buffer", where the words are waiting to be parsed.

First off, the stack starts with [ROOT]. Then you shift one word over, I. Since there's no dependencies, you shift another word over, ate. Now we have the first dependency, a left arc. The dependent word is removed from the stack. Since there's no more dependencies, another shift happens. With ate and fish, there's a right arc dependency, and removing the dependent word fish from the stack. Then, the buffer is left empty and the stack only has one word remaining.

iatefish_parsing

We can translate this into a program now.

class PartialParse(object):
def __init__(self, sentence):
"""Initializes this partial parse.

@param sentence (list of str): The sentence to be parsed as a list of words.
Your code should not modify the sentence.
self.stack: The current stack represented as a list with the top of the stack as the
last element of the list.
self.buffer: The current buffer represented as a list with the first item on the
buffer as the first item of the list
self.dependencies: The list of dependencies produced so far. Represented as a list of
tuples where each tuple is of the form (head, dependent).
Order for this list doesn't matter.
"""

self.stack = ["ROOT"]
self.buffer = sentence.copy()
self.dependencies = []

def parse_step(self, transition):
"""Performs a single parse step by applying the given transition to this partial parse

@param transition (str): A string that equals "S", "LA", or "RA" representing the shift,
left-arc, and right-arc transitions. You can assume the provided
transition is a legal transition.
"""

if transition == "S":
word = self.buffer.pop(0)
self.stack.append(word)
elif transition == "LA":
head = self.stack[-1]
dependent = self.stack.pop(-2) # Remove second last item in stack
dependency = (head, dependent)
self.dependencies.append(dependency)
elif transition == "RA":
head = self.stack[-2]
dependent = self.stack.pop(-1) # Remove last item in stack
dependency = (head, dependent)
self.dependencies.append(dependency)

First, the stack, buffer, and dependencies are initialized. Then, under the function parse_step, we look at what transition is being applied, with S representing a shift, LA representing a left arc and RA representing a right arc. For LA and RA, the dependent is removed from the stack, and the (head, dependent) pair is added to the dependencies list.

References

Stanford CS224N lecture 5
Stanford CS224N assignment 3