Enhancing Satellite Planning for Earth Observation
A Novel Approach Using Graph Neural Networks and Deep Reinforcement Learning
The Earth Observation Satellite Planning (EOSP) tackles the complex task of scheduling observations for satellites that monitor our planet. This process involves selecting which observations to prioritize, given that there are many more requests than the satellite can handle within necessary time limits and movement restrictions. To make effective plans, it is essential to choose the observations that will yield the greatest benefits and then create a schedule that can accommodate these choices. Previous research largely relied on traditional methods for solving these problems, but this study introduces a new approach using advanced techniques called Graph Neural Networks (GNNs) and Deep Reinforcement Learning (DRL). GNNs help analyze and gather useful insights from the data, which is structured as graphs representing different scheduling scenarios. Meanwhile, DRL is used to help find the best possible schedules. The results of their simulations indicate that this new method learns successfully from smaller examples and can apply that knowledge to larger, real-world situations, offering strong results compared to older methods.
The Earth Observation Satellite Planning Problem
The focus is on Earth observation satellites (EOS) that capture images of different locations on Earth to meet user demands. An agile EOS can target areas beyond the point directly below it, known as "nadir." These satellites operate in low orbit, meaning they have a longer time frame for observing an area than the time it takes to take a picture. Adjusting the satellite’s position for these observations involves changes in its orientation, known as pitch, yaw, and roll, which can cause delays depending on the specific observations and the duration of the transition.
Typically, these satellites face a challenge known as the Earth Observation Satellite Planning Problem (EOSP). This problem arises because there are usually more requested observations than the satellite can complete during its operational time. Different observations might have different levels of importance or utility, making it necessary to choose the best mix of tasks to fulfill user needs effectively and create a schedule that respects the satellite's operational limits. The most complicated EOSP cases involve managing multiple satellites, each with different orbits and tasks, which may include taking many pictures of the same area at different times or using different satellites.
Simplifying the Scheduling Problem
In this research, the analysis is limited to simpler scenarios involving one satellite operating in one orbit and taking single images, even though the scheduling problem is notoriously difficult to solve. This difficulty arises because, for many realistic situations with thousands of possible observations, finding the perfect schedule isn't feasible. Instead, researchers often look for approximate solutions using algorithms that provide good enough results rather than the best possible ones.
A popular method for creating these solutions comes from a technique known as Greedy Randomized Adaptive Search Procedure (GRASP), alongside the emerging interest in using deep learning for optimization tasks. In particular, Deep Reinforcement Learning (DRL) has shown promise in generating effective strategies for complex problems. Building on previous research that combined Graph Neural Networks (GNN) and DRL to solve scheduling issues, this paper adjusts that approach specifically for the EOSP. The findings indicate that this new method offers significant improvements over currently existing techniques. The main contributions of this work include developing a new representation of the problem that doesn't break time into discrete parts, employing deep reinforcement learning for better problem-solving, and leveraging graph representations as direct inputs for analysis.
Graph Representations and Scheduling
The problem is approached by creating a large graph, referred to as a "discrete graph," that represents the sequential decisions involved in the observations. In this graph, each potential observation is linked to specific starting times. Each point in this graph, called a node, shows a particular observation and its start time. For example, a node is labeled with an observation number and its start time within a specified range. Connections between nodes, called edges, indicate that one observation can directly follow another based on time constraints. These edges consider how long each observation takes and the necessary waiting periods before the next observation can start. Notably, there is a rule that prevents the same observation from being scheduled at two different times simultaneously.
Some connections in the graph are removed for optimization. If another observation can be placed between two existing observations without violating time rules, the direct link between them is eliminated. This ensures that every time the two observations are included in a solution, the additional observation will also be part of it.
To address the issue of the discrete graph becoming too large, a more compact representation, known as the continuous graph, was developed. This continuous graph only includes each observation once without specifying when it started. In this continuous graph, connections between nodes (representing observations) are based on whether there were connections in the original discrete graph. This method can result in two nodes being interconnected if the observations they represent can happen in any order.
During each cycle of their algorithm, the system uses this compact continuous graph to decide which observation to prioritize next. To keep track of what choices are blocked by this decision and what options are still available, they update the discrete graph by marking the chosen observation as starting at the earliest possible time. Importantly, this adjustment in the discrete graph is reflected back in the continuous graph for the next cycle.
Reinforcement Learning Approach
The approach used in this research focuses on creating an effective schedule for the EOSP by treating it as a reinforcement learning challenge. In reinforcement learning, the goal is to solve what is known as a Markov Decision Process (MDP), a model used for making decisions in a sequence of steps. An MDP consists of a set of states, possible actions, a way to describe how actions move us between states, and a method for determining rewards.
In this case, the state is a specific scenario within the EOSP graph. It can either be the starting point, where no tasks are selected yet, or a later point with some tasks already in the schedule. The available actions depend on what has been selected last, allowing the addition of further tasks in a chronological manner. MDPs are great because they can manage uncertainty in decision-making. However, the researchers designed the EOSP scenario to be deterministic, meaning outcomes are predictable. When starting from a particular state and taking an action, the results lead to a specific next state without uncertainty. Each chosen action related to a task contributes a certain value, which is used as the reward in this model.
The goal is to enhance the total rewards from these actions over time, building up a schedule that yields the best results. While the MDP setup is comprehensive, the researchers do not provide complete information about the state to the learning model. Instead of using the entire detailed graph, which could be overwhelming, they utilize a simplified version that doesn't capture everything perfectly, particularly regarding how long actions take. This leads to what is known as a partially observable MDP, where the incompleteness mainly involves estimated durations for tasks, which is not a major concern in the overall framework.
Learning Process and Graph Neural Networks
In this research, the approach is based on reinforcement learning, where the system, referred to as the agent, is trained using graphs that represent partial schedules. The agent's task is to choose the next observations for scheduling and improve its performance based on rewards that measure how effective the schedules are. A simulator handles the different graphs and provides the necessary data to the learning agent. The agent works within a framework that maps states, which are the current conditions of the schedules, to actions, meaning the choices it can make for scheduling.
The goal of this learning process is to create a policy that allows the agent to maximize rewards across a variety of real-world problems that are used to train it. Importantly, the agent needs to adapt well to new problems it hasn't encountered before, meaning it should perform well without additional training on unfamiliar instances.
The architecture of the system involves several processing stages. Initially, the input graphs are transformed to enable efficient communication among different parts of the system. Then, simpler networks generate representations for the nodes and edges in the graphs. Following this, a specialized type of neural network called a Message-Passing Graph Neural Network (MP-GNN) is used to extract important features from the graphs and to calculate the likelihood of various actions being successful. Finally, the reinforcement learning algorithm adjusts the parameters of the whole system based on the rewards it receives, refining both the initial representations and the MP-GNN.




