Syntactic-Semantic Dependency Correlation in Semantic Role Labeling: A Shift in Semantic Label Distributions

Semantic Role Labeling (SRL) is an important component for natural language understanding (Palmer et al. 2013). It identifies the semantic role of words, abstracting away superficial variations caused by ways of expression. Semantic dependencies (upper part of Figure 1) embody such information, specifying the semantic role label actor of the argument it to the predicate eliminate. Semantic dependency provides useful information for tasks such as question answering (Yih et al. 2014), automatic summarization (Jin et al. 2020; Kumar and Raghuveer 2012), text mining (Poria et al. 2014), and beyond. On the syntax side, a similar notion of syntactic dependencies (lower part of Figure 1) arise to encode grammatical relations. Syntactic dependency specifies the grammatical relation modifier of the dependent word hardly to the head word eliminate. The two types of dependencies correlate heavily with each other, creating many parallelisms between them. Figure 1 shows an example of such parallelism, where pigeon is both an experiencer argument and an object dependent to the verb predicate eliminate. In general, words that are object dependents of a verb predicate often be an experiencer argument of the predicate. The parallelism motivates a large body of literature enhancing machine learning-based semantic

parsers with syntactic dependencies. The literature includes works such as He et al. (2018) and Roth and Lapata (2016). While those works lead to performance improvements, most are satisfied with the score uplift and avoid a more fundamental problem: what underlines the correlation between syntactic and semantic dependencies. In our paper (Chen et al. 2022), we study the statistical property underlying the dependency correlation.
We interpreted the dependency correlation as a shift in semantic label distributions. The label distribution (Dozat and Manning 2018) models the distribution over semantic role labels for a dependency spanning the predicate and the argument. A non-relation label indicates no semantic dependency. We found that the label distribution changes significantly with the hop patterns of the shortest syntactic dependency path (SSDP) connecting the predicate and the argument. The hop pattern reflects the length of syntactic dependencies. The distribution shift corresponds to our changing expectations for semantic dependencies. For example, we would have a high expectation for a semantic dependency given a short syntactic dependency but a low expectation given a long syntactic dependency. We modeled the distribution shift using a mixture model-based semantic parser, which we will explain in the next section. Compared to previous syntax-aware semantic parsing methods, modeling the distribution shift improves performance in predicting short-distance semantic dependencies while retaining the performance advantage in long-distance dependencies. Modeling the distribution shift also provides a small but significant performance uplift compared to syntax-aware semantic parsing baselines, as well as an SRL system competitive with state-of-the-art methods.

Modeling Distribution Shift with Mixture Models
In this section, we present analyses supporting our interpretation and explain the motivation behind the usage of the mixture model. We interpreted the dependency correlation as a label distribution shift with SSDP hop patterns. SSDP, a salient feature for exploiting the dependency correlation, is the shortest path connecting the predicate and the argument in the syntactic dependency structure (Figure 1). Hop patterns (α, β) count the number of transitions going from the predicate to the argument. α counts the dependent-to-head transition that goes in the opposite direction as syntactic dependencies, while β counts the head-to-dependent transition going in the same direction as syntactic dependencies. The count reflects the syntactic distance between the predicate and the argument. We refer readers to the paper for more technical details about SSDP and hop patterns.
The left figure in Figure 2 provides a glimpse of the label distribution shift. Each vertical Meanwhile, it learns the label distribution for each cluster, modeling the variation.
We confirm the label distribution shift using a mutual information analysis. In the analysis, we compute the mutual information for models aware of hop pattern information and models unaware of such information, and define the gap in mutual information value as mutual information gain The left heatmap in Figure 3 shows the mutual information gain for each hop pattern. We see that long patterns have a near-zero information gain, whereas short patterns have various information gains. The (0, 1) pattern has the highest information gain.
The mixture model-based semantic parser learns a cluster assignment agreeing with the mu- tual information analysis. The right table in Figure 3 illustrates the learned cluster assignment.
We see that the model assigns one cluster to long patterns sharing the non-relation dominating distribution, and assigns different clusters to short patterns. The model also assigns a unique cluster to the (0, 1) pattern, the pattern with the most outstanding label distribution and with the highest information gain. The parallelism between the label distribution visualization, the mutual information analysis, and the learned cluster assignment supports our interpretation that semantic label distributions shift with hop patterns.

Discussion
The abundance of parallelism between the two types of dependencies indicates a strong correlation between syntactic and semantic dependencies. The parallelism suggests that we can extract a large chunk of semantic dependencies using solely syntactic dependencies. It builds up a high expectation that syntactic dependencies provide plenty of information for extracting semantic dependencies.
Despite the strong correlation, the impact of syntactic information had decreased over time. between the expectation and the reality necessitates the study of the dependency correlation. More specifically, the statistical property underlines the dependency correlation. A deeper understanding of the statistical property would help us better utilize the correlation.
Our interpretation is a generalization over a widely-adopted co-occurrence bias. The bias suggests that semantic dependencies co-occur mainly with short syntactic dependencies and are unlikely to co-occur with long dependencies. We showed that the long hop patterns have zero mutual information gain and share a label distribution dominated by the non-relation label.
Previous methods adopt this bias, focusing on the short dependencies rather than the long ones. However, it is to be noted that our interpretation is only a small step towards a deeper study of the correlation. The interpretation used only the hop pattern, a feature reflecting the length of syntactic dependencies. Many other factors, such as the syntactic dependency label, also play an important role in the correlation. A more comprehensive analysis is needed for a better understanding of the correlation.