DeepFoids: Adaptive Bio-Inspired Fish Simulation with Deep Reinforcement Learning

DeepFoids is an adaptive bio-inspired fish simulation for autonomously producing schooling behavior in groups of fish that adapts to changes in the environment by deep reinforcement learning.

Background

A stable, inexpensive, and safe supply of fish is very important, and the aquaculture industry is attracting attention as a means of protecting this natural resource. However, on fish farms, wasted food and fecal matter can negatively affect the surrounding environment. Optimizing feeding can be effective at solving this problem.

However, performing experiments in the field to optimize feeding is not only cost and time intensive, but will itself be a source of environmental impact until the experiments succeed.

Our approach

Therefore, we developed a simulated ecosystem of schooling fish utilizing computer graphics. By performing many simulations in CG, the problems of time, cost, and environmental impact can be solved.

To accomplish this, it is necessary to reproduce realistic fish behavior. In addition to individual fish behavior, the autonomous generation of various schooling behaviors is also needed. By introducing ecological parameters to individual fish, and then training them, we succeeded in generating various collective behaviors, such as swarming and milling.

Biological approach

We have introduced the following ecological parameters into our simulation.

  1. The fish’s preferred temperature
  2. The fish’s preferred light intensity
  3. The personal space of each fish
  4. The fish’s response to obstacles and the water surface as rewards
  5. A decision making interval
  6. The social ranking of fish
  7. A range of swimming speeds relative to body lengths 
  8. Cohesion, Alignment, Separation as rewards

Deep Reinforcement Learning (DRL)

Each fish is treated as an agent with a complex decision-making process in a multi-agent system, wherein individual rewards and collective rewards compete. Deep reinforcement learning is then applied to the system.
In real fish, dopamine is known to act as a reward in the learning process. This dopamine-based process is known as reinforcement learning. We adopted deep reinforcement learning for our fish simulation.

State in Reinforcement Learning

In the same way that fish can sense only their relative rotation using their lateral line, the fish agents only observe their relative direction. Each fish agent observes the difference between its forward direction and that of the nearest neighboring fish in front of it, within its species’ defined sensing range, and also observes its current depth, and then stores them as 3D tensors.

Action in Reinforcement Learning

Each agent chooses an orientation and a forward movement speed.

Reward/Penalty setting

The reward at each training iteration is computed by the summation of the following 7 sub-components.

rt = rtBC + rtNC + rtBD + rtND + rtE + rtM + rtC

  • rBC : the penalty given when a fish collides with the cage or water surface.
  • rNC : the penalty given when fish collide with each other.
  • rBD : the reward given when a fish avoids the cage and water surface.
  • rND : the reward given when a fish is close to other fish.
  • rE : the penalty given when a fish expends energy.
  • rM : the reward given when a fish swims faster than the minimum speed without changing orientation, or the penalty given when a fish abruptly changes its depth.
  • rC : the reward given when a fish successfully attacks its target. For the victim of an attack, it is a penalty.

After each decision making interval, each fish receives a reward or penalty based on its actions.

Learning process

These data are fed to the DRL network, and schooling behavior is learned. 
As a result, they succeeded in autonomously acquiring different swarming behaviors in sparse and dense conditions.

Simulation results

Simulation vs. Real

We compare the simulation result versus real footage. When the population is sparse, the simulation generates a swarm pattern with low polarization that align with the real-world data. With dense population, the simulated fish formed a milling pattern as the real footage.

PPO vs. SAC

Both PPO (Proximal Policy Optimization) and SAC (Soft Actor Critics) created milling patterns for 1000 coho salmon.

Social ranking

The simulation also succeed in generating aggressive behaviors where dominant fish chase and strike subordinate fish.

Application

We succeeded in generating schooling behavior. We utilized these results to produce training data for computer vision, and implemented a fish-counting algorithm. To create realistic visuals, we visited fish farms to film fish and their environment, and then reproduced them in CG. We automatically generated an annotated synthetic dataset and used it as training data.
For test data, we captured video from the fish farms.

References of biology

[25] Charles M. Breder and Florence Halpern, Physiological Zoology, 19(2):154-190, 1946.
[26] William Wallace Reynolds and Martha Elizabeth Casterlin, American zoologist, 19(1):211–224, 1979
[27] Arne J Jensen, Bjørn O Johnsen, and Laila Saksgård, Canadian Journal of Fisheries and Aquatic Sciences, 46(5):786–789, 1989.
[28] I Huse and JC Holm, Journal of Fish Biology, 43:147–156, 1993.
[29] Anders Fernö, Ingvar Huse, Jon-Erik Juell, and Åsmund Bjordal, Aquaculture, 132(3-4):285–296, 1995
[30] Jon-Erik Juell, Reviews in fish biology and fisheries, 5(3):320–335, 1995.
[31] Martin Føre, Tim Dempster, Jo Arve Alfredsen, Vegar Johansen, and David Johansson, Aquaculture, 288(3-4):196–204, 2009.
[32] G. Macaulay, D. Wright, F. Oppedal, and T. Dempster, Aquaculture, 519:734925, 2020.
[33] Victoria A Davis, Robert I Holbrook, and Theresa Burt de Perera, Communications biology, 4(1):1–5, 2021
[34] Lucy Odling-Smee and Victoria A Braithwaite, Fish and Fisheries, 4(3):235–246, 2003.
[35] Alisha A Brown, Marcia L Spetch, and Peter L Hurd, Psychological Science, 18(7):569–573, 2007.
[36] Thomas H. Miller, Katie Clements, Sungwoo Ahn, Choongseok Park, Eoon Hye Ji, and Fadi A. Issa, Journal of Neuroscience, 37(8):2137–2148, 2017.
[37] Chiweyite Ejike and Carl B Schreck, Transactions of the American Fisheries Society, 109(4):423–426, 1980.
[38] Yoshitaka Sakakura and Katsumi Tsukamoto, Journal of Applied Ichthyology, 14(1-2):69–73, 1998.