Improved Decomposition Strategy for Joint Entity and Relation Extraction

The joint entity and relation extraction task detects entity pairs along with their relations to extract relational triplets. A recent study (Yu et al. 2020) proposed a novel decomposition strategy that splits the task into two interrelated subtasks: detection of the head-entity (HE ) and identification of the corresponding tail-entity and relation (TER) for each extracted head-entity. However, this strategy suffers from two major problems. First, if the HE detection task fails to find a valid headentity, the model will then miss all related triplets containing this head-entity in the head role. Second, as Yu et al. (2020) stated, their model cannot solve the entity pair overlap (EPO) problem. For a given head-entity, the TER extraction task predicts only a single relation between the head-entity and a tail-entity, even though this entity pair can hold multiple relations. To address these problems, we propose an improved decomposition strategy that considers each extracted entity in two roles (head and tail) and allows a model to predict multiple relations (if any) of an entity pair. In addition, a corresponding model framework is presented to deploy our new decomposition strategy. Experimental results showed that our approach significantly outperformed the previous approach of Yu et al. (2020) and achieved state-of-the-art performance on two benchmark datasets.


Introduction
The extraction of relational triplets is a critical and challenging task in natural language processing (NLP). Given an unstructured text, it aims to extract pairs of entities with semantic relations, in the form of (head, relation, tail ). The relational triplets extraction has attracted considerable research effort as it plays a vital role in many NLP applications such as information extraction (Tran et al. 2021) and question answering . For example, in information extraction, given a biomedical text, it is expected to extract both the biomedical entities and propagation and ignore the relevance between the two subtasks. To address these problems, subsequent studies proposed joint learning of entities and relations in a single model, including feature-based models (Yu and Lam 2010;Li and Ji 2014;Ren et al. 2017) and neural networkbased models (Gupta et al. 2016;Katiyar and Cardie 2017;Zeng et al. 2018;Fu et al. 2019;Yu et al. 2020).
One of the biggest challenges of this task is the overlapping triplet problem, which is expressed in two scenarios: entity pair overlap (EPO) and single entity overlap (SEO). Specifically, EPO occurs when triplets share the same entity pair but with different relations, such as: ("Paris", "Capital of", "France"), ("Paris", "Located in", "France"), and ("Paris", "Administrative division of", "France"), as shown in Figure 1. SEO occurs when two relational triplets share only one common entity, such as: ("John Smiths", "Work in", "Paris") and ("John Smiths", "Live in", "France").
Most previous works could not efficiently address the overlapping triplet problem. This problem directly challenges conventional sequence tagging schemes, in which each token represents only a single tag (Zheng et al. 2017). It also creates significant difficulties in traditional RC approaches, where an entity pair is supposed to hold at most one relation (Miwa and Bansal 2016). Zeng et al. (2018) is among the first to solve the problem by proposing a sequence-to-sequence model with a copy mechanism. Fu et al. (2019) utilized a graph convolutional network to extract overlapping triplets. In contrast to the previous works, Yu et al. (2020) presented a unified sequence labeling framework based on a novel decomposition strategy. However, this method can only deal with the SEO triplets in the sample and fails to handle the EPO cases, as Yu et al. (2020) stated.
Specifically, Yu et al. (2020) decomposed the joint task into two subtasks: head-entity extraction and tail-entity relation extraction. The first task detects all head-entities, whereas the second one detects the corresponding tail-entities and target relations for a given head-entity.
Although this method significantly outperforms previous methods, it suffers from two issues.
First, to create relational triplets, it always detects head-entities first and then extracts the corresponding tail-entities and relations for each detected head entity. Thus, observably, if the first task fails to find a valid head-entity, the model will then miss all the related triplets containing this head-entity in the head role. Second, as Yu et al. (2020) stated, their model cannot solve the overlapping triplet problem in the EPO scenario. For a given head-entity, the second task predicts only a single relation between the given head-entity and any corresponding tail-entity, even though this entity pair can hold multiple relations. Therefore, we propose an improved decomposition strategy to overcome these two problems.
For the first issue, we designed a more flexible strategy. We detect all entities first, and then, for each extracted entity, we identify it in each (head / tail ) entity role and extract the corresponding (tail-entities / head-entities) and relations. For the second issue, we define a set of "unified relation labels" (URLs), each of which represents a unique (unordered) subset of the full set of original relations. By using these URLs in a multiclass classifier, our model can solve the EPO problem. In addition, a corresponding model framework is introduced to deploy our new strategy. The experimental results on both two benchmark datasets showed that our approach significantly outperformed the previous approach of Yu et al. (2020) as well as previous state-ofthe-art approaches.

Methodology
In this section, we first introduce the decomposition strategy of Yu et al. (2020) and then present our new strategy. In addition, a corresponding model framework is proposed for deploying our decomposition strategy.  Yu et al. (2020) decomposed the joint extraction task into two interrelated subtasks: H ead-E ntity (HE ) extraction and T ail-E ntity Relation (TER) extraction. The HE extraction task is modeled by two sequence labeling tasks, one for identifying the start position and the other for the end position of the head-entities, respectively. The entity type is also labeled simultaneously at the head-entity positions. Meanwhile, for each identified head-entity, the TER extraction task is also modeled by two sequence labeling tasks, one for detecting the start position and the other for detecting the end position of the corresponding tail-entities. As is done for the HE detection, the relation type between the given head-entity and its corresponding tail-entity is also labeled in each position.
with the expected relation R 3 because of the gold triplet: ("Paris", R 3 , "France"). However, this tagging scheme suffers from two existing problems (as mentioned in Section 1) that hinder a further improvement of the system performance. We explain in detail how our new decomposition strategy can solve these two issues.
First, to obtain relational triplets in the form of (head, relation, tail ), the model by Yu et al. (2020) always detects the HEs first and then extracts the corresponding tail-entities and relations for each detected HE. Following this strategy, if HE tagging fails to find a valid HE, the model will then miss all the related triplets. For instance, in Figure 2(a), if "Paris" is not identified as a HE, the model will miss all gold triplets containing "Paris" in the head role. Meanwhile, it is not always easy to extract head entities first for all relations, especially when the relation types are diverse. To deal with this issue, we designed a new strategy, which is illustrated in Figure 2 This strategy allows our model not only to learn the probability distribution closer to the gold labels but also to increase the chances of extracting a valid triplet, which may be overlooked by the approach of Yu et al. (2020). Specifically, we first extract all entities without differentiating the head /tail role using the Entity tagging in our scheme. For each extracted entity, the head/tail entity relation (HTER) tagging considers it in each head/tail role and detects all corresponding tail entities/head entities and relation types, respectively. For example, in Figure 2(b), the Entity tagging detects the entities: "John Smiths", "Paris", and "France". Then, for the given entity "Paris", the HTER tagging considers it in the head role to identify the tail-entity "France" with the unified relation label (URL)R 2 , and also considers "Paris" in the tail role to recognize the HE "John Smiths" with the URLR 1 .
Second, the previous tagging scheme cannot solve the EPO problem. For instance, in Figure 2(a), the entity pair ("Paris", "France") holds multiple relations: R 3 , R 4 , and R 5 . However, for the given HE "Paris", the TER tagging can predict only one of the original relations to the tail-entity "France", using a multi-class classifier of (N R +1) classes, which include N R original relations and one special class No relation. To overcome this limitation, we propose two different solutions. First, in a natural way, we use a multi-label classifier to detect multiple relations (if any) between an ordered entity pair. With this solution, each tagging position in the HTER tagging can hold multiple original relation types (if any), instead of only a maximum of a single relation type (if any), as assumed by Yu et al. (2020). However, in practice, the maximum number of relation types co-occurring between an entity pair is often small 1 . For instance, the maximum number of relation types for any entity pair is only 3 in both the NYT (Riedel et al. 2010) and the WebNLG (Gardent et al. 2017) datasets, while the total number of original relations on the NYT and WebNLG datasets is 24 and 216, respectively. Consequently, the sparse label problem on the relation types of the same entity pair can affect the system performance, especially in the WebNLG dataset. Therefore, we propose a second solution that uses a multi-class classifier with a set of URLs to deal with both the sparse label problem and the EPO problem. In essence, the purpose of using the created URLs is to transform the "multi-label classification task with a sparse label problem on the set of original relations" into the "multi-class classification task on the set URLs".
Using the training set D and a predefined threshold γ, following Algorithm 1, we create the set URLs. Specifically, first of all, for each ordered entity pair p in each sample in D, the function F (p) returns a single URLR that represents a unique (unordered) subset, where this Algorithm 1 Creation of a set of "unified relation labels" Input: D: training dataset; γ: a pre-defined threshold. Output: URLs, the expected "unified relation labels" set. In Table 1, we provide a toy example for creating URLs using Algorithm 1 and for using them on the training set D. Assume that the training set D includes two samples, where each sample has its gold relational triplets. By using Algorithm 1, we obtain the dictionary Q, which contains all the "URLs" along with their frequencies. With the predefined threshold γ (e.g.; γ = 1), we obtain the set URLs: {R 1 ,R 2 ,R 3 ,R 4 }. Then, using the created set URLs, for each entity pair in each sample, we replace all existing original relations of this pair with a single corresponding URL in the set URLs (if any). For instance, in Sample 2, the ordered entity pair: ("Alex ", "Spain") with the original relations: {"Work in", "Place of birth", "Place of death"} will become: ("Alex ",R 4 , "Spain"). Finally, our designed model will be trained on the training set D with the set URLs.

D with the set URLs
Sample 1 Harry works as an artist in Rome, the capital of Italy.

Network Structure
Following our tagging scheme in Figure 2(b), we present our corresponding model framework in Figure 3. It consists of three main parts: Encoding Layer, Entity Extractor, and HTER Extractor.

Encoding Layer
Given a sample X = {x 1 , x 2 , ..., x N } with N tokens, we first utilize a bidirectional long shortterm memory (BiLSTM) (Hochreiter and Schmidhuber 1997) network to encode the contextualized representation for each token. The initial embedding e i of each input token is concatenated by three parts: pre-trained word embedding, character-level word embedding generated by a convolutional neural network (CNN) on the character sequence of x i , and a part-of-speech (POS) embedding. Then, the contextualized representation sequence H = {h 1 , h 2 , ..., h N } is obtained as follows: Fig. 3 Our framework. We used the same input sample as in Figures 1 and 2. Here, the extracted entity "Paris" is entered into the HTER Extractor as prior knowledge. In the HTER Extractor, R1 andR2 are in the set URLs created using Algorithm 1, whereR1: {"Live in", "Work in"}, R2:{"Capital of", "Located in", "Administrative division of"}. Note that the HTER Extractor was trained with the set URLs, instead of with the set of original relations.

− →
where LSTM f and LSTM b denote the forward and backward LSTM, respectively.

Entity Extractor
The Entity Extractor module aims to recognize the relevant entities in the input sample by directly decoding the output sequence H of the Encoding Layer. Specifically, it adopts two identical multiclass classifiers to detect the start and end positions of the entities with the corresponding entity type label. Formally, the detailed operations of the entity tagging on each token are as follows: where p start−ent i and p end−ent i represent the probabilities of the entity type labels for the i th token, which are considered as the start and end positions of an entity, respectively. In addition, h i is the encoded representation, W (.) represents the trainable weight, and b (.) is the bias.
We define the training loss (to be minimized) of the Entity Extractor as the sum of the negative log probabilities of the true start and end tags, using the predicted distributions: whereŷ i start ent andŷ i end ent are the true start and end tags (gold labels) of the i th word in the sample X, respectively, and N is the length of the sample X.

HTER Extractor
The HTER Extractor consists of two submodules: Head-Entity Relation (HER) extractor and Tail-Entity Relation (TER) extractor. For each given entity, e.g., "Paris", it uses the TER to identify "Paris" in the head entity role and detect all the corresponding tail-entities and URLs, such as ("Paris",R 2 , "France"), whereR 2 :{"Capital of", "Located in", "Administrative division of"}. At the same time, the HTER Extractor utilizes the HER submodule to identify "Paris" in the tail entity role and detect all the corresponding head-entities and URLs, such as ("John Smiths",R 1 , "Paris"), whereR 1 :{"Live in", "Work in"}.
Specifically, from the output sequence H of the Encoding Layer, as an entity is often composed of multiple tokens, we create a span feature representation for the given entity. Following Ouchi et al. (2018), for the entity with start and end positions: j and k (j ≤ k), we obtain the entity representation vector as follows: where i refers to the position of the i th word in the input sample.
Because the information of a given entity is crucial for extracting related triplets, we therefore concatenate each token vector in the output sequence H and the given entity representation v ent .
We take X = {x 1 , x 2 , · · · , x N } as the input to another BiLSTM layer, to fuse each h i and v ent in a single vector h i : where Then, the sequence H is used as the same input to both TER and HER submodules. The TER submodule detects all the corresponding tail-entities and relations by directly decoding the sequence H. Specifically, it uses two identical multiclass classifiers to detect the start and end positions of the related tail-entities with the corresponding relation type.
Thus, the detailed operations of the tail entity tagging with the relation type on each token are described as follows: Similarly, the HER submodule utilizes two other identical multi-class classifiers to detect the start and end positions of the related head-entities with the corresponding relation type.
Formally, the detailed operations of the head tagging on each token are as follows: Therefore, we have the loss function of each submodule in the HTER Extractor as follows: where N is the length of the input sample;ŷ i start tail andŷ i end tail in Eq. 13 are the true start and end relation tags of the i th word for annotating the related tail entities, respectively, and y i start head andŷ i end head in Eq. 14 are the true start and end relation tags of the i th word for annotating the related head entities.

Joint Learning
To boost the interaction between the Entity Extractor and the HTER Extractor, we combine their loss functions to form the entire loss objective of our model: where the hyper-parameter α is fine-tuned in the range (0, 1]. Then, we train the model by minimizing L(θ) through the Adam stochastic gradient descent (Kingma and Ba 2014) over shuffled mini-batches. Note that the HTER Extractor is trained with the gold set URLs, which are created using Algorithm 1, instead of with the set of original relations.

Inference
In the testing phase, the triplets can be easily inferred on the basis of the two modules.
Specifically, for each input sample, we first extract the entities by using the Entity Extractor module. Note that entities extracted by this module will not be considered as an additional constraint on the output of the other module. Then, for each detected entity, we utilize the HTER Extractor to consider it in the head /tail roles and extract all the relational triplets involving this entity. For example, from the input sample in Figure 3, the Entity Extractor is expected to detect the entities: "John Smiths", "Paris", and "France". Then, for each extracted entity, e.g., "Paris", the HTER Extractor uses its two submodules (HER and TER) to extract all relational triplets containing "Paris". Specifically, the TER submodule identifies "Paris" in the head role and extracts: ("Paris",R 2 , "France"). Meanwhile, the HER submodule considers "Paris" in the tail role and extracts: ("John Smiths",R 1 , "Paris").
Note that the relation types in the triplets extracted by both the HER and TER submodules belong to the set URLs because they are trained with this set. Thus, we need to transform these relations into the original relations by breaking them down into the original relations and creating the corresponding triplets. In the above example, for the given entity "Paris", the TER submodule extracts the triplets {("Paris",R 2 , "France")}. In addition, as shown in Figure 2,R 2 represents for the subset {"Capital of", "Located in", "Administrative division of"}.

Datasets and Evaluation Metrics
Following the previous work (Dai et al. 2019;Yu et al. 2020), we evaluated our approach on two widely used datasets: NYT (Riedel et al. 2010) and WebNLG (Gardent et al. 2017). To further study the capability of our approach to extract overlapping and multiple relations, we also split the test set into three categories: Normal, EPO, and SEO. A sample belongs to Normal if none of its triplets overlaps, whereas it belongs to EPO if some of its triplets share the same entity pair. In addition, a sample belongs to SEO if some of its triplets share only one common entity. The statistics of the two datasets are given in Table 2.
We report the standard micro precision, recall, and F1-score, as in line with recent studies.  In addition, the relation number of the WebNLG was miswritten as 246, as in (Fu et al. 2019;Yu et al. 2020), which is the total number of relations in the original WebNLG dataset instead of the number of the subsets they used. We recounted and provided the correct number.
Specifically, a predicted triplet is correct if and only if its relation type and its two corresponding entities are all the same as those in the gold standard annotation. The results of the test set were reported when the development set achieved the best result.

Implementation Details
We implemented the neural networks using the PyTorch library. 2 Batch padding was applied to pad the lengths of all tokens to make them equal to the maximum length in each batch. The mini-batch training size was set to 64, which was selected from the set: [32,50,64].
We used the 300-dimensional GloVe (Pennington et al. 2014)  All experiments were run on a Tesla V100 graphics card in an Ubuntu-based computer system.

Comparison Models
For comparison, we employed the following models as baselines: • NovelTagging (Zheng et al. 2017): The first model to introduce a novel tagging scheme that transforms the joint extraction task into a sequence labeling problem.
• MultiDecoder (Zeng et al. 2018): A seq2seq model with a copy mechanism that converts the joint extraction task to a sequence-to-sequence problem. • MultiHead (Bekoulis et al. 2018): A joint neural model that performs entity recognition and relation extraction simultaneously.
• GraphRel (Fu et al. 2019): An end-to-end relation extraction model that uses GCNs to jointly learn named entities and relations.
• OrderRL (Zeng et al. 2019): A sequence-to-sequence model with reinforcement learning that takes the extraction order into consideration.
• ETL-Span (Yu et al. 2020): A sequence labeling framework based on a novel decomposition strategy that has achieved a notable performance; however, its decomposition strategy still cannot solve the EPO problem, as the authors stated.

Analysis of Our Decomposition Strategy
To gain more insight into the improvement of our decomposition strategy in our model (in Figure 3), we conducted further experiments, as reported in Table 4. We also reproduced the results of ETL-Span (Yu et al. 2020). Table 4, we considered our model without using the set URLs. for all relations, as in some cases it might be easier to detect the tail entities first before the head entities. Thus, our flexible approach overcomes this problem and significantly improves the recall. Note that our approach achieved a better improvement in the F 1-score on the WebNLG than that on the NYT. One possible reason is that, because the number of relation types in the WebNLG (216 types) is much larger than that in the NYT (only 24 types), it increased the probability of relations where it was easier to detect the tail entities first before the head entities. the maximum number of relations of the same entity pair is 3 on both of these training sets.

First, for case (a) in
Consequently, the sparse label problem of the multilabel classification on the same entity pair is more severe in the WebNLG than in the NYT. Therefore, it considerably affected the system performance on the WebNLG. Meanwhile, although this problem is less severe in the NYT than in the WebNLG, it also hinders the further improvement of the system performance.
Finally, as our model suffers from the sparse label problem for multilabel classification of the same entity pair in case (b), we considered the second solution to solve the EPO problem.
Specifically, in case (c), because a multiclass classification can alleviate the sparse label problem, we used multiclass classifiers with the URLs created using Algorithm 1 in the HTER Extractor.
Interestingly, by using this simple solution, we achieved the highest system performance for both the NYT and WebNLG. Compared with case (a), the solution increased the F 1-score by 6.2 and 0.9 on the NYT and WebNLG, respectively. It is worth mentioning that the improvement gain on the NYT was significantly larger than that on the WebNLG. One possible reason is that the EPO problem on the NYT is more serious than that on the WebNLG. In Table 2 Compared with the ETL-Span model by Yu et al. (2020), in Table 4, our best model (case (c)) achieved a significant improvement of the system performance with an increase in the F 1-score by 7.3 and 3.0 on the NYT and WebNLG test sets, respectively. In addition, on the NYT test set, compared with the ETL-Span model, although our best model boosted the recall significantly by 16%, the precision decreased by 2%. One possible reason for the decrease in the precision is that our model tries to train all three parts (i.e., Entity Extractor and the two submodules: TER and HER) effectively at the same time, which might be more challenging than training only two elements simultaneously (i.e., the HE Extractor and TER Extractor ), as in the ETL-Span model of Yu et al. (2020). In future work, we plan to design model architectures more effectively, to obtain a satisfactory level of not only the recall measure but also the precision measure, thereby further improving the F 1-score.

Analysis of Different Sample Types
To verify the capability of our model to extract multiple triplets, we followed the procedure in (Zeng et al. 2018;Fu et al. 2019) and conducted further experiments on the NYT test set.
Specifically, we first split the samples in this test set into three categories: Normal, EPO, and SEO, and then we investigated the performance of each category.
The results are shown in Figure 4. It can be seen from the figure that the performance performance in all the three categories. In addition, we paid special attention to the performance differences between our approach and that of Yu et al. (2020). Notably, on the NYT test set, our approach boosted the F 1-score significantly in the EPO problem by 27.0, whereas that of (Yu et al. 2020) cannot solve this problem. In addition, as the strategy of Yu et al. (2020) strictly constrains the detection of the entities to the head first, if it fails to find a valid HE, it will then miss the related triplets. It will be more serious if this HE attends many different triplets in the head entity role (a case of SEO). Therefore, our flexible approach deals with this issue and substantially improves the F 1-score by 3.4 in the SEO problem.
We also compared the ability of the models to extract multiple triplets in a sample. Specifically, we divided the samples of the NYT test set into five categories, where each category contains samples that have 1, 2, 3, 4, or ≥ 5 triplets, respectively. The results are shown in Figure 5. It can be seen from the figure that our approach achieved a significant improvement in extracting multiple triplets compared with the other models. In particular, our model showed a more stable performance when the number of triplets in the sample increased. These results show that our approach is effective in dealing with the multi-relation extraction task.

Case Study
To gain more insight into the effectiveness of our model in overcoming the existing disadvantages in the approach of Yu et al. (2020), we analyzed the prediction outputs of both models on a few samples of the NYT and WebNLG test sets and these are shown in Tables 5 and 6, respectively.

Dealing With the EPO Problem
In Table 5, we show two examples from the NYT test set and compare the predicted triplets of the model of Yu et al. (2020) with those of our model.
As mentioned earlier, the model of Yu et al. (2020) cannot solve the EPO problem. For any entity pairs, their model only predicts a single relation, although an entity pair can have multiple relations. For instance, in Sample 1, the ordered entity pair ("Somalia", "Mogadishu") has two relations: "/location/country/capital" and "/location/location/contains". However, the model of Yu et al. (2020) extracted only a single relation and created the triplet: ("Somalia", "/location/location/contains", "Mogadishu"). Similarly, in Sample 2, although the ordered entity pair ("Ethiopia", "Addis Ababa") has three relations: "/location/country/capital", "/location/country/administrative divisions", and "/location/location/contains", their model predicted only the relation "/location/location/contains" for this pair. Thus, the more serious the EPO problem is, the more degraded the system performance becomes. Meanwhile, our model overcomes this disadvantage and effectively solves the EPO problem. For both samples above, our model fully detected all possible relations for the pair ("Somalia", "Mogadishu") in Sample

Sample 2
Though officials in Addis Ababa, Ethiopia's capital, have said their troops should not enter downtown Mogadishu, many are camped in the former American Embassy, a decrepit building that was closed more than 15 years ago after American soldiers suffered a humiliating defeat at the hands of warlords.

Effect of the "Exhaustive Search" Strategy
As shown in Figure 3, our model uses the Entity Extractor to detect all entities first. Then, for each detected entity, the HTER Extractor utilizes its two submodules to identify the entity in each head /tail role and extracts all the corresponding tail entities/head entities and relations.
The final output of our model is always obtained by combining the results of the two submodules without any duplicate triplets. In essence, this approach can be considered as an "exhaustive search" strategy that aims to increase the chances of extracting a valid triplet that may be overlooked by the approach of Yu et al. (2020). Therefore, in Table 6, we compare the prediction outputs of both approaches on three samples from the WebNLG test set.
First, in Sample 3, the HE Extractor in the model of Yu et al. (2020)

missed the HE "Athens
International Airport", thereby overlooking the valid triplet: ("Athens International Airport ", "cityServed", "Athens") in the ground truth. Meanwhile, in our model, the entity "Athens International Airport" was detected by the Entity Extractor. Then, the TER+URLs submodule of the HTER Extractor identified this entity in the head role and extracted the triplet ("Athens International Airport ", "cityServed", "Athens"). Additionally, we compared the outputs of the HER+URLs and TER+URLs submodules in our model. Although two triplets, namely, ("Athens", "country", "Greece") and ("Greece", "leaderName", "Alexis Tsipras"), were easily obtained by the two submodules, the HER+URLs submodule failed to extract the valid triplet: ("Athens International Airport ", "cityServed", "Athens") when considering the entity "Athens" in the tail role. Thus, in this example, the TER+URLs submodule achieved a better result than that of the HER+URLs submodule.
Second, in Sample 4, the approach of Yu et al. (2020) omitted two valid triplets in the ground truth: ("The Secret Scripture", "publisher", "Faber and Faber ") and ("Ireland ", "location", "Europe"), because the HE Extractor missed two HEs: "The Secret Scripture" and "Ireland". In our model, although the module Entity Extractor could detect the entity "The Secret Scripture", its TER+URLs submodule failed to extract the triplet ("The Secret Scripture", "publisher", "Faber and Faber ") when considering "The Secret Scripture" in the head role. Meanwhile, thanks to the HER+URLs submodule, it extracted this missed triplet by considering "Faber and Faber" in the tail role and detecting the corresponding HE "The Secret Scripture" with the relation type "publisher". Based on the outputs of the two submodules, it is clear that the HER+URLs submodule yielded a better result for this sample than that of the TER+URLs submodule.
We further consider the system performance of the predicted outputs of the (HER+URLs and TER+URLs) submodules of the HTER Extractor of our model on the entire WebNLG test set in Table 7. We can see that the number of predicted triplets by the HER+URLs submodule is 1,510, whereas this number is 1,530 for the TER+URLs submodule. In addition, these two submodules share 1,368 common predicted triplets. Thus, the overlap percentage of the output of the HER+URLs submodule is 90.6, whereas this rate is 89.4 for the output of the TER+URLs submodule. In Table 7, our model achieved the best performance when combining the predicted outputs of the two submodules.
On the basis of the results of the analysis of the examples in Table 6 and of the performances of the submodules of our model in Table 7, we conclude that the "exhaustive search" strategy of our model is effective in solving the entity and relation extraction task.

Impact of Using Pre-trained Language Models
For a fair comparison, like in Yu et al. (2020), we did not exploit the advantages of using pretrained language models. In reality, a well-known pretrained language model named BERT was first proposed by Devlin et al. (2019). It has been widely applied to various NLP downstream tasks and has achieved considerable success. For the entity and relation extraction task, Hang et al.
(2021) presented a BERT-based model named BERT-JEORE and obtained superior performance.
Therefore, we further investigated the impact of using pretrained language models when they were used in our model.
Specifically, for our model in Figure 3, we replaced only the first BiLSTM encoder with a pretrained BERT-Base encoder to extract the representations of the original words from the input sample. Note that the BERT model first uses its tokenizer to split each original word into tokens (if necessary) and then outputs the vectors of these tokens. Thus, we obtained the representation of each original word by averaging its start token vector and its end token vector. In

Related Works
Researchers have made great efforts to extract relational triplets from unstructured text, which can be directly used for automatic knowledge graph construction. Early works (Zelenko et al. 2003;Zhou et al. 2005; Chan and Roth 2011) regarded the joint extraction task in a pipeline manner. They extracted relational triplets in two isolated steps: (1) first by running the NER on the input sample to recognize all entities, and (2) then by running the RC on all pairs of the extracted entities. Although these pipeline methods are quite simple, they usually suffer from the error propagation problem and ignore the relevance between the two steps.
To ease the two issues above, subsequent works attempted to build joint learning models that learn entities and relations simultaneously in a single manner. They can be divided into two main approaches: feature-based models (

Conclusion and Future Work
This paper proposes a new decomposition strategy along with a corresponding model framework for the joint entity and relation extraction task. Our approach mainly focuses on solving the overlapping triplet problem, one of the biggest challenges of this task, as only a few existing works can tackle this problem effectively. Our model uses a module to extract all the relevant entities, and for each extracted entity, another module is utilized to consider its head/tail entity roles and extract all the related triplets. In addition, the use of URLs helps to sufficiently deal with the sparse label problem of relation types in the same entity pair (e.g., EPO cases), which can be prevalent in this task. Experimental results on the two widely used datasets (NYT and WebNLG) showed that our model achieved a notable performance compared with a recent work (Hang et al. 2021). The results of further analysis experiments showed the effectiveness of our approach in handling overlapping and multiple triplet extraction scenarios.
Our proposed methodology has considerable potential for practical NLP applications such as information extraction, knowledge base population, and question answering. Moreover, the idea of using URLs may be relevant and promising for multilabel classification problems in general, not just for a specific task such as the entity and relation extraction task. In future work, we plan to apply this idea to the text classification task. Additionally, we also would like to introduce other methods for solving the overlapping triplet problem more effectively, such as considering how to change the weight of a label depending on whether it is a subset of the true label, and integrating available knowledge bases of entities into current models for boosting the system performance.
information extraction from texts or images, image segmentation and deep learning.