Week 4

Pre-evaluation Preparation + Exploring Potential Issues for Current NSpM

Posted by Stuart Chen on June 23, 2019

The Week 4

Hurry up! It’s already the summer now!

We should better prepare for the first evaluation in the next week!

In this week, I have tried another idea to more exhaustively annotate the potential etities in the natural language.

Also, I have gone through almost all the existing templates to get profound insights into the running mechanism of the current model.

The connectivity in my country is intermittently irreliable that I created a new module instead of the SPARQLwrapper python libraray in order to get the results from the Virtuoso endpoint.

  1. The constituency parsing might be more efficient for extracting the structure of a natural language sentence.

  2. In the current NSpM model, there might be highly dependent on the restricted NL-SPARQL vocabulary key-value
    pairs to conduct the neural machne translation from the pre-processed question text to SPARQL query formatted vocabulary.

  3. While training the models, I noticed that if the entities annotation function that maps from the natural
    language text to the DBpedia entities could be added to improve the entity recognition accuracy, the performance should have a better upgrade.

1. Pre-evaluation Preparation

1.1 Constituency Parsing might be a good way to get the syntactial structure

What is it? How is it diffrent from the previous method of dependency parsing?

Syntactic analysis is a very important part of analyzing sentences, including constituency parsing and dependency parsing, which have very large differences.

Let’s take an exemple, the constituency parse tree breaks a text into sub-phrases. Non-terminals in the tree are types of phrases, the terminals are the words in the sentence, and the edges are unlabeled. For a simple sentence “John sees Bill”, a constituency parse would be:

constituency_parse_tree

It might do better in matching the relation between the pair(Object, Verb, Subject). However, it distract the focus from the importance of pairing the pair(Object + Property). So, I stll think dependency parsing is more pertinent to our problem in this aspect.

1.2 The Self-made Easy Mehod when the SPARQLwrapper does not connect

The design goal is to post a request to the Virtuoso endpoint, so we simply do this:

    import request

    def getVirtuoso(query):
        data={
                'query':query,
                'default-graph-uri':'http://dbpedia.org'
                }
        response = requests.post("https://dbpedia.org/sparql", data=data)
        try:
            content = response.json()
        except:
            content = response.text
        return content;

2. Exploring Potential Issues for Current NSpM model

The Duplicated Issue For Data Generation From Templates

While running the given generator.py, the terminal showed that ther was a key be added an additional value.

Then I open the generated dataset file from the generator.py, searching for the redundant key-value pair. There was a vocabulary appeared twice in the dataset, which should be blamed for the duplicated mapping problem.

From this, I have an idea, why not use a comprehensive set of embeddding vector set to represent the vocabulary mapping to the knowledge graph entities in DBpedia?

This might help in two problems:

1) the restrictedness of the vocabulary while the model works with the unprecedentde entity vocabulary that had not learned in the traing;

2) the global representation in the bi-LSTM encoding layer could not accurately handle the recognition of the entities in natuaral language.