2024 Volume 21 Issue Supplemental Article ID: e211014
In collective systems, influence of individuals can permeate an entire group through indirect interactionscom-plicating any scheme to understand individual roles from observations. A typical approach to understand an individuals influence on another involves consideration of confounding factors, for example, by conditioning on other individuals outside of the pair. This becomes unfeasible in many cases as the number of individuals increases. In this article, we review some of the unforeseen problems that arise in understanding individual influence in a collective such as single cells, as well as some of the recent works which address these issues using tools from information theory.
This paper highlights the challenges of inferring causal relationships in systems of many agents. We summarize recent developments in applying information theory and image analysis, which are able to extract information about individual cell-to-cell relationships such as their network structure or the domain of interaction. The developed method is expected to enable us to unveil them for systems of millions or more interacting cells.
Groups can achieve complex behaviors that transcend the ability and complexity of the individuals that constitute the group. This phenomenon is evident in various biological systems, including the aggregation and collective migration of cells [1], the flocking of birds [2], and the schooling of fish [3]. For example, colonies of the amoeba Dictyostelium discoideum (Dd) can aggregate into mounds consisting of thousands of cells upon starvation [4]. Understanding such “emergent behaviors” is crucial for many biological applications such as wound healing, cancer growth, and embryo development. The root cause of emergent behavior can only be understood by investigating the interactions between individuals, since individual cells cannot perform such complex actions by themselves. Furthermore, it is expected that all individuals do not contribute equally in developing, especially in triggering, emergent behaviour of the collective. Indeed, there is(are) at least one or a few individuals which has(have) the ability to influence and guide other group members in collectives [5]. When an aggregate of the individuals move, sometimes an individual may trigger the neighboring others to follow itself, which may be interpreted as a leader and the others nearby the leader as the followers. Thus, understanding the collective behavior of systems relies on understanding the influence of individuals and their interactions with each other, as well as on understanding the structure of leader-follower relationships in the system.
To examine the emergence of collective systems and the individuals that they are composed of, experimental data such as sequences of fluorescence images of single cells can be used, from which one can construct time series data that is associated with properties of individuals. Understanding the influence of individuals in the colony then can be achieved through reconstructing the causal relationships between the measured time series variables of interest representing different individuals. One way to examine the existence of such causal relationships is using measures such as cross-correlation [6] or Granger Causality [7]. Alternatively, one can use information-theoretic schemes such as time-delayed mutual information (TDMI) [8], transfer entropy (TE) [9], or intrinsic mutual information (IMI) [10].
In inferring causal relationships between two stochastic variables, one underlying assumption is that all other confounding variables, or indirect influences, can be accounted for. Systems of cells, however, such as developing organs, amoeba colonies, or cancer tumors, typically consist of millions of cells, which make it computationally infeasible to condition on all other possible relevant variables. Even for a single pair of cells, there are other assumptions that complicate inferring causal inference from time series. For example, consider a time series trajectory of the positions of two cells, cell A and cell B. We would like to infer the influence of the position of cell A on that of cell B. Naturally, cell A’s position cannot instantaneously affect that of cell B, so we must consider some time lag between them, that is, we are inferring the influence of the present value of cell A on some future time point of cell B. However, cell B’s future position is also affected by its own present. Thus when inferring the relationship between cell A and cell B with some time lag, the time-lagged measurements of B are themselves confounding factors. The past of A also has an effect on the present value of A, and when there is some bi-directional influence, the past of B would have had an effect on the present or further in past values of A. Thus the past values of each cell, or the time-lagged influences, must be considered as confounding factors in inference using time series data of cells.
In addition to challenges that arise due to confounding factors, collective systems can also be difficult to measure and interpret in their data in sequences of images of cells. For example, the tracking of individual cells can be almost infeasible by human eye for many cells, and automated tracking does not always produce reliable results. When trying to compute a measure of influence between cells in a large cell colony, it may be more prudent to analyze the system using the Eulerian reference frame. In this setting, one may obtain the velocity vector field as it evolves over time in the cell colony, and then compute the measure of influence (e.g., TE) between different regions of the colony. One technique to compute the Eulerian velocity vector fields from sequences of images is called particle image velocimetry (PIV) [11]. Later, we discuss how PIV has been validated to produce reliable velocity vector fields in cell colonies that vary in their level of coherent motion.
This article reviews three central issues in inferring the influence of individuals in collective systems:
1. What challenges arise due to large numbers of interactions and the effect of time-lagged influence in collective systems?
2. Given the above challenges, what measures are promising for understanding the individuals in collective systems?
3. In sequences of images of cell colonies where individual cell tracking is not feasible, how can we obtain reliable velocity data for inferring influence?
The rest of the paper is structured as follows: In Measuring influence with information theory we review information-theoretic measures that are used to understand pairwise interactions in collective systems. In the Models section, we introduce graphic models that represent different types of time-lagged influence between a leader and a follower, as well as modifications to the Vicsek model that follow the same patterns of influence as the graphs. We then introduce a modified version of the Vicsek model which can simulate larger systems of leaders and followers. In the Review of results section, we review three key findings: 1) the number of agents that need to be conditioned on for inferring influence can be reduced greatly by utilizing the transfer entropy with cutoff distance. 2) Modes of information flow can help infer about the time-lagged influence in collective systems, as well as improve estimations of influence given the challenges associated with indirect influence from the past of a variable. 3) When tracking individual cells becomes impractical, utilizing PIV to measure cell motion can determine the optimal parameters for understanding the velocity dynamics of the system. In the Conclusion and perspectives section we summarize our paper and discuss some future directions of this area of research.
Information theory plays a pivotal role in advancing our understanding of causality within the context of time series data. Utilizing fundamental concepts from information theory, such as Shannon entropy and mutual information, researchers are empowered to quantitatively assess the extent of uncertainty and information within time series data sets. This methodological approach is instrumental in elucidating causal relationships by facilitating the identification of patterns, dependencies, and the transfer of information between variables over temporal dimensions.
The concept of information theory begins with Shannon entropy or information
entropy [12], simply referred to be entropy in
this paper. Let
Here,
Here,
Stochastic variables
While mutual information provides a measure of the statistical dependence
between two stochastic variables, its inherent symmetry makes it inadequate for
inferring causal direction. However, an alternative is time-delayed mutual information,
which introduces a temporal aspect to capture potential causal relationships.
Time-delayed mutual information considers the influence of one variable on another with
a time lag, offering insights into the directionality of causal connections. The
time-delayed mutual information, denoted by
(1) |
From Eq. 1, the asymmetricity of time-delayed mutual information is apparent
so that one can evaluate how much the current information of
(2) |
where
Graph representations for models for different interaction types. (A) Type A. (B) Type B. (C) Type C. (D) Type D. [Reprinted with permission from Ref. [15]. Copyright ©2023, American Association for the Advancement of Science.]
(3) |
and
(4) |
where
Similarly, James et al. [10] also
proposed to decompose the time-delayed mutual information from
(5) |
where
To examine the pitfalls of using TE for inferring influence, models with agents that have different dependencies on their past have been examined. Figure 1 represents four different types of two-agent models, labeled Type A, Type B, Type C, and Type D. A node defined as L or F in Figure 1 represents either a leader or follower agent, respectively. A directed edge from one agent to another signifies that the first agent exerts direct influence on the second agent. An edge from an agent to itself signifies that an agent’s future state depends on its present. In all Types, L exerts direct influence on F and F does not exert direct influence on L. In Type A, the future of L and F both do not depend on their present sates. In Type B, the future state of F depends on its present state, and the future state of L does not depend on its present state. In Type C, the future state of L depends on its present state, and the future state of F does not depend on its present state. In type D, the futures states of both L and F depend on their current states. In Review of Results, we will discuss further how measuring influence from L to F and F to L in these different models can lead to counter-intuitive results using TE or TDMI.
Our Modified Vicsek ModelOver the last few decades, many researchers have made considerable efforts to model the dynamic characteristics of individuals in the collective behavior of multi-agent systems [16–20]. One of the widely studied models in this context is the Vicsek model (VM) due to its simple mathematical design and its effectiveness in explaining the fundamental properties of collective motion [17]. The basic principle of VM is that each individual moves and updates its direction of motion based on the average direction of motion of its neighbors at each time step in presence of some noise. However, the VM does not distinguish between individuals’ roles as leaders and followers, as all individuals have the same influence on their neighbors. Therefore, Basak et al. [14,21] modified the original Vicsek model by integrating asymmetric interactions among individuals, designating individuals with greater influence as leaders and those with lower strengths as followers.
They observe the trajectories of N self-propelled individuals within a square box of length L subject to periodic boundary conditions, where all individuals are randomly positioned and oriented at the initial time i.e.,
(6) |
where
(7) |
The term
where the sum counts the individuals
The parameter
We introduced information-theoretic methodologies aimed at deducing the domain of interaction by analyzing the trajectories of agents. These methodologies hinge on a parameter referred to as the ‘cutoff distance
In the context of the problem, we conceptualized the domain of interaction as a circular area with a radius denoted as
Take note that the previously mentioned papers [14,21,22] presumed uniformity in the interaction domains among all agents in the system, conforming to the interaction rules of the VM. However, this assumption may not always hold true, as the interaction domain can vary among individuals based on factors such as chemical signals, phenotypic variation, and other characteristics. For instance, in Dd colonies, cells exhibit varying levels of excitability. The communication among Dd cells involves a diffused chemical signal, and the radius of influence of a cell is contingent upon the excitability of the cell it is influencing. This steers our efforts towards the objective of employing the TE versus cut-off technique in real-world systems that deviate from ideal VM rules. A modified VM, proposed in [23], deviates from the assumption of agents sharing uniform interaction domains. In this model, an agent influences the motion of another agent only when the former is located inside the interaction domain of the latter, which is similar to models of auditory sensing in bats [24]. Hence, an agent with a larger interaction radius is more susceptible to external influences from other agents in the system than an agent with a shorter interaction radius. Hence, in a system involving two agents with varying interaction domains, the agent with the shorter interaction domain is regarded as the leader, while the second agent is recognized as the follower (Figure 2).
Particle interaction depends on position. (A) the solid particle’s motion is influenced by the dotted particle within its interaction domain, but not vice versa. (B) both particles influence each other as they are within each other’s interaction domains. (C) particles move independently since they are outside each other’s interaction domains. [Reprinted with permission from Ref. [23]. Copyright ©2023 AIP Publishing LLC.]
How can we identify the number of interaction radii in a system? Consider a two-agent system with interaction radii
Derivative of average TE with respective of the cutoff distance λ,
How can one determine the interaction radii of specific agents? Understanding the interaction characteristics between agents is crucial for establishing each agent’s interaction domain. In the modified VM described in [23], agent
The derivative of the average inward TE for particle 1 with respect to the cutoff distance λ is depicted as a function of λ. In the inset, the average inward TE of particle 1 is illustrated as a function of the cutoff distance λ. [Reprinted with permission from Ref. [23]. Copyright ©2023 AIP Publishing LLC.]
In developing the theory of TE, Schreiber originally noted that TDMI from the present state of X to the future of Y contains information from the present state of Y itself. Later this was termed as shared information (Eq. 5), which is interpreted as the shared history between the two variables. In order to interpret TDMI as information flow, an assumption must be made that the present state of Y does not influence its future state. TE is widely thought to relieve this assumption because TE measures the reduction in uncertainty of the future state of Y from knowing the present state of X, while considering the present state of Y as being already known. James et al. [10], however have shown that TE fails to remove the so-called synergistic information to predict the future of
The models shown in Figure 1 are meant to build upon the notion of shared history in computing information flow measures. In [15] the VM corresponding to each graph in Figure 1A–D was simulated, and the three information measures TE, TDMI, and IMI were computed. It was shown that in Type A, where neither the future of L or F depend on their present, TDMI, TE and IMI all give the same result in terms of measuring how well the present of L can predict the future of F. In other words, shared and synergistic effects are zero. In type B, where the future state of F depends on its present, we found that TE
Summary of information flow modes through interaction type A–D
Interaction Type | Intrinsic | Synergistic | Shared | Dependency on past history |
---|---|---|---|---|
Type A | >0 | 0 | 0 | Both L and F do not depend on their past |
Type B | >0 | >0 | 0 | Only F depends on its past |
Type C | >0 | 0 | >0 | Only L depends on its past |
Type D | >0 | >0 | >0 | Both L and F depend on their past |
It is natural to expect the presence of multiple individuals acting as leaders in the dynamics of collective behavior observed in real-world phenomenon in nature. Hence, in general, multiple leaders may contribute to the dynamics of followers in collective behavior, and by observing their interactions, greater insights can be gleaned about collective behavior compared to scenarios where followers are influenced by a single leader. Sattari et al. [15] discussed pairwise information flows as a function of noise (
S as a function of noise level η0 (in units of π radians) for a model with different numbers of leaders and followers. (A) SL→F, and (B) SF→L as a function of η0 (in units of π radians) for four agents with one leader and three followers (blue) and two leaders and two followers (red). (C) SL→F and (D) SF→L for eight agents with one leader and seven followers (blue), two leaders and six followers (red), three leaders and five followers (yellow), and four leaders and four followers (purple). (E) SL→F and (F) SF→L with three followers and one leader (blue), two leaders (red), and three leaders (yellow). (G) Graph representation of model A, where there is one leader and three followers. (H) Graph representation of model A, where there are two leaders and two followers. [Reprinted with permission from Ref. [15]. Copyright ©2023, American Association for the Advancement of Science.]
T as a function of noise level η0 (in units of π radians) for model A with one leader and different numbers of followers. Here, NF=1 (blue), 3 (red), 7 (yellow), and 15 (purple), where the number of leader is alwaysone. [Reprinted with permission from Ref. [15]. Copyright ©2023, American Association for the Advancement of Science.]
This confirms that the bumps emerge because of the simultaneous impact of the current states of both the leader and the follower. Then, to understand how this structure (bumps) changes as increase of number leader (
Particle tracking velocimetry (PTV) is a widely recognized method for obtaining a “Lagrangian descriptor” of velocity dynamics by tracking individual cells. However, manual implementation for numerous cells can be labor-intensive, and automated approaches using supervised or unsupervised techniques may be more problematic especially with high cell density or rapid cell movement. In contrast, an Eulerian descriptor of the velocity field can be achieved through particle image velocimetry (PIV), where flow properties are expressed as a field without identifying individual cells. We conducted a thorough exploration to identify and understand the optimal Particle Image Velocimetry (PIV) parameters specifically tailored for the examination of cell motility. To accomplish this, we employed simulation models to make it possible to compare PIV velocity vector field to the underlying individual agents’ motions [25].
PIV captures the flow patterns of a fluid as it passes a stationary observation point. It determines velocity vectors by identifying a zone in a subsequent frame that best matches a given grid unit cell in the current frame. For that, each image is initially divided into grid unit cells (simply termed as “grids” otherwise noted), with the center of each grid remaining constant across all images. For a grid at time
In simulating the VM, we knew the positions and velocities of individual particles at each time instance. We utilized these trajectories to assess the performance of PIV in capturing the motion of single agents. The simulation involved generating images of the particles by plotting markers representing agents at each particle location, oriented in the direction of particle motion. Subsequently, the PIV technique was applied to these images, enabling a direct comparison between the velocities computed by PIV and the velocities known from the trajectories, assessed through the alignment score,
(8) |
where
Schematic illustrates particle groups with diverse alignment scores (AR), with the thick shaded arrow representing a PIV vector defined over a grid, and the ovals with arrows representing the moving particles along with their respective directions of movement. (a) AR close to –1 due to opposing PIV vector; (b) AR close to 0 for randomly moving particles; (c) AR close to 1 when PIV vector aligns with particle movement. [Reprinted with permission from Ref. [25]. Copyright ©2023 Springer Nature Limited.]
The alignment score, defined in Eq. 8 was utilized to assess the extent to which PIV vectors reflect the motion of nearby particles. The alignment score
The alignment score AR is plotted against time t for various radius R values under different noise conditions: (a)
The exploration of identifying the optimal values for both
where
Figure 9 illustrates the landscape of
The landscape of AR at
The information-theoretic and image analysis techniques highlighted in this article are applicable to understanding the structure of a wide variety of many-body systems from observations alone. For example, in observing a swarm of un-manned air vehicles (UAVs) or animals, one may seek knowledge about which individuals are most influential or how their influence propagates through the system. In that case, we propose analysis based on modes of information flow or PIV can help extract important insights. More work needs to be done, however, before one can apply these techniques for systems of homogeneous quality, for example, in the human body where the interactions between parts of the system occur in different ways and in different size and time scales.
Specifically for the analysis of large colonies of cells, recent developments in live cell imaging can capture both micro- and macro-scale information simultaneously, achieving individual cell resolution while also providing a viewing window at the cell colony scale. By noting the ability to extract information about multiple scales simultaneously, these images have been termed as “trans-scale” imaging, and can provide crucial insights for addressing the question of how an individual cell influences the collective.
Inferring the relationships between individual cells from these huge data sets requires new mathematical tools due challenges specific to inference in systems of large numbers of interacting agents. We have the effects of individual and shared history on inference measurement, and propose that the decomposition of transfer entropy and time-delayed mutual information into intrinsic, shared, and synergistic information flows can help to elucidate the effects of history.
The interaction domain of individuals is one crucial parameter for inferring influence in order to reduce the number of candidate agents that can influence a given agent. We have shown that transfer entropy dependent on distance can determine the interaction domain of agents in a collective even when the sizes of domains of individuals can be heterogeneous. We also have addressed tools that can extract information about the network structure and role of cell memory in collectives using modes of information flow. It was demonstrated using simple two-agent models that shared history determines the existence of shared or synergistic information flows. In systems of more than two agents, it was shown that synergistic information flow can be used to distinguish systems that have follower-to-follower interaction in addition to leader-to-follower interaction from those which only have leader-to-follower interaction.
In addition to the information-theoretic techniques for extracting information from trajectories, we have analyzed how PIV can extract relevant velocity information from cell images even when individual cell tracking is not possible. This can be applied in conjunction with information theory in future works. For example, using PIV, one can extract velocity fields of regions, and then use transfer entropy dependent on distance to infer the domain of influence of cell motion or the network structure of interaction between different regions of cells. This can help understand the emergent properties of systems of cells that cannot be addressed by studying individual cells alone.
Understanding emergent behavior is a task whose benefit will extend beyond cell colony dynamics. Emergent behavior has puzzled scientists in many disparate fields such as artificial intelligence [26], social science [27], robotics [28], and ecology [29]. Since information-theoretic methods do not confine their applicability to a single model, the works summarized here may be applied to understanding how individual influence plays a role in emergent behavior in different fields of research.
The data that support the findings of this study are available from the corresponding author upon request.
The codes are available from the corresponding author upon request.
We thank Prof. Kazuki Horikawa for his continuous fruitful discussions. This work was supported by a Grant-in-Aid for Scientific Research on Innovative Areas Singularity Biology (No. 8007) (Grant No. 18H05413), MEXT, the research program of Five star Alliance in NJRC Matter and Dev (Grant No. 20191062-01), the JSPS (Grant Nos. 25287105 and 25650044, T.K.), and the JST/CREST (Grant No. JPMJCR1662, T.K.). U.S.B is supported by the University Grants Commission (UGC) of Bangladesh research grant (grant no. Physical Science-72-2021) and the Pabna University of Science and Technology research fund. M.T. is supported by the Research Program of “Dynamic Alliance for Open Innovation Bridging Human, Environment and Materials” in “Network Joint Research Center for Materials and Devices” and a Grant-in-Aid for Scientific Research (C) (No. 22654047, No. 25610105, No. 19K03653, and No. 23K03265) from JSPS. The composition of our A02-02 group and our collaborators in the Singularity Biology project are displayed in Table 2.
Our A02-2 group composition and collaborators in the Singularity Biology
Principal Investigator: | Tamiki Komatsuzaki (Hokkaido University) |
Co-Investigator (CI): | Atsuyoshi Nakamura (Hokkaido University) |
Shunsuke Ono (Tokyo Institute of Technology) | |
Collaborating Researcher (CR) | Ichigaku Takigawa (Kyoto University) |
Collaborators in the Singularity Biology | |
A01-1 PI | Tomonobu Watanabe (RIKEN/Hiroshima University) |
A01-2 PI | Takeharu Nagai (Osaka University) |
A03-2 PI | Kazuki Horikawa (Tokushima University) |
Publicly offered research A03 | Satoshi Sawai (Tokyo University) |
Publicly offered research A03 | Masakazu Sonoshita (Hokkaido University) |
All authors declare that they have no conflict of interest.
T.K., S.S. designed the project over discussions. U.S.B. and S.S. conducted analyses and experiments. All authors wrote and confirmed the contents of the manuscript.