Graph clustering is a long-standing problem in data mining and machine learning. Traditional graph clustering aims to partition a graph into several densely connected components. However, with the proliferation of rich attribute information available for objects in real-world graphs, vertices in graphs are often associated with a number of attributes that describe the properties of the vertices. This gives rise to a new type of graphs, namely attributed graphs. Thus, how to leverage structural and attribute information becomes a new challenge for attributed graph clustering. In this paper, we introduce the state-of-the-art studies on clustering large attributed graphs. These methods propose different approaches to leverage both structural and attribute information. The resulting clusters will have both cohesive intra-cluster structures and homogeneous attribute values.
Many of society's systems are dependent on information technology (IT), which means that securing the safety of IT systems is of the utmost importance. Furthermore, numerous stakeholders (managers, customers, employees, etc.) exist in the risk measures decision-making process for these IT systems, which makes it necessary to have a means of communicating risk measures so that stakeholders can easily form a consensus when necessary. For this purpose, we have developed a Multiple Risk Communicator (MRC) to assist in consensus formation within organizations and a Social-MRC system to support social consensus formation, which we have applied to various problems. This paper describes the considerations that IT system risk communication should take, describes the development of the necessary support systems, and provides information on the results of their application.
It is widely argued that today's largely reactive, “respond and patch” approach to securing cyber systems must yield to a new, more rigorous, more proactive methodology. Achieving this transformation is a difficult challenge. Building on insights into requirements for cyber science and on experience gained through 8 years of operation, the DETER project is addressing one facet of this problem: the development of transformative advances in methodology and facilities for experimental cybersecurity research and system evaluation. These advances in experiment design and research methodology are yielding progressive improvements not only in experiment scale, complexity, diversity, and repeatability, but also in the ability of researchers to leverage prior experimental efforts of others within the community. We describe in this paper the trajectory of the DETER project towards a new experimental science and a transformed facility for cyber-security research development and evaluation.
Recent malware communicate with remote hosts in the Internet for receiving C&C commands and updating themselves, etc., and their behaviors can be diverse depending on the behaviors of the remote hosts. Thus, when analyzing these malware by sandbox analysis, it is important not only to focus behaviors of a malware sample itself but also those of the remote servers that are controlled by attackers. A simple solution to achieve this is to observe the live sample by an Internet-connected sandbox for a long period of time. However, since we do not know when these servers will send meaningful responses, we need to keep the sample being executed in the sandbox, which is indeed a costly operation. Also, leaving the live malware in the Internet-connected sandbox increases the risk that its attacks spill out of the sandbox and induce secondary infections. In this paper, we propose a novel sandbox analysis method using a dummy client, an automatically generated lightweight script to interact with the remote servers instead of the malware sample itself. In the proposed method, at first we execute a malware sample in the sandbox that is connected to the real Internet and Internet Emulator. Secondly, we inspect the traffic observed in the sandbox and filter out high-risk communications. The rest of the traffic data is then used by the dummy client to interact with the remote servers instead of the sample itself and effectively collects the responses from the servers. The collected server responses are then fed back to the Internet Emulator in the sandbox and will be used for improving observability of malware sandbox analysis. In the experiment with malware samples captured in the wild, we indeed observed a considerable number of changes in the responses from the remote servers that were obtained by our dummy client. Also, in comparison with the simple Internet-connected sandbox, the proposed sandbox could improve observability of malware sandbox analysis.
In Vehicular Ad Hoc Networks (VANETs), the vehicular scenario requires smart signaling, smart road maintenance and other services. A brand new security issue is that the semi-trusted Road Side Units (RSUs) may be compromised. In this paper, we propose an Elliptic curve ElGamal Threshold system-based key management scheme for safeguarding a VANET from RSUs being compromised and their collusion with malicious vehicles. We analyze the packet loss tolerance for security performance demonstration, followed by a discussion on the threshold. After discussion of the feasibility on privacy and processing time, overhead analysis is presented in terms of two types of application scenarios: Emergency Braking Notification (EBN) and Decentralized Floating Car Data (DFCD). Our method can promote security with low overhead in EBN and does not increase overhead in DFCD during security promotion.
We propose a length-preserving enciphering scheme that achieves PRP security and streamable decryption. No enciphering scheme satisfying these properties is known. Our enciphering scheme is suitable for secure communication on narrowband channels and memory-constrained devices. Although length-preserving enciphering schemes satisfying the SPRP security, which is stronger than the PRP security, are known, it is impossible to support the SPRP security and the streamability at the same time. Namely, the memory to store an entire plaintext/ciphertext is required. When the decryption is performed with memory-constrained devices, the PRP security is the strongest concept of achievable security.
It is becoming more and more important to make use of personal or classified information while keeping it confidential. A promising tool for meeting this challenge is secure multi-party computation (MPC). However, one of the biggest problems with MPC is that it requires a vast amount of communication. We analyzed existing MPC protocols and found that the random number bitwise-sharing protocol used by many of them is notably inefficient. By devising a representation of the truth values and using special form prime numbers, we propose efficient random number bitwise-sharing protocols, dubbed “Extended-Range I and II,” which reduce the communication complexity to approximately 1/6th that of the best of the existing such protocol. We reduced the communication complexity to approximately 1/26th by reducing the abort probability, thereby making previously necessary backup computation unnecessary. Using our improved protocol, “Lightweight Extended-Range II,” we reduced the communication complexities of equality testing, comparison, interval testing, and bit-decomposition, all of which use the random number bitwise-sharing protocol, by approximately 91, 79, 67, and 23% (for 32-bit data), respectively. We also reduce the communication complexity of private exponentiation by about 70% (for 32-bit data and five parties).
Machine translation of patent documents is very important from a practical point of view. One of the key technologies for improving machine translation quality is the utilization of syntax. It is difficult to select the appropriate parser for English to Japanese patent machine translation because the effects of each parser on patent translation are not clear. This paper provides an empirical comparative evaluation of several state-of-the-art parsers for English, focusing on the effects on patent machine translation from English to Japanese. We add syntax to a method that constrains the reordering of noun phrases for phrase-based statistical machine translation. There are two methods for obtaining the noun phrases from input sentences: 1) an input sentence is directly parsed by a parser and 2) noun phrases from an input sentence are determined by a method using the parsing results of the context document that contains the input sentence. We measured how much each parser contributed to improving the translation quality for each of the two methods and how much a combination of parsers contributed to improving the translation quality for the second method. We conducted experiments using the NTCIR-8 patent translation task dataset. Most of the parsers improved translation quality. Combinations of parsers using the method based on context documents achieved the best translation quality.