Credit: own work, based on image generated by DALL·E 2.
We present a framework for solving multi-rendezvous spacecraft guidance problems with strict feasibility guarantees.
Abstract
Optimizing space vehicle routing is crucial for critical applications such as on-orbit servicing,
constellation deployment, and space debris de-orbiting. Multi-target Rendezvous presents a
significant challenge in this domain. This problem involves determining the optimal sequence
in which to visit a set of targets, and the corresponding optimal trajectories: this results in
a demanding NP-hard problem. We introduce a framework for the design and refinement of
multi-rendezvous trajectories based on heuristic combinatorial optimization and Sequential
Convex Programming. Our framework is both highly modular and capable of leveraging
candidate solutions obtained with advanced approaches and handcrafted heuristics. We
demonstrate this flexibility by integrating an Attention-based routing policy trained with
Reinforcement Learning to improve the performance of the combinatorial optimization
process. We show that Reinforcement Learning approaches for combinatorial optimization
can be effectively applied to spacecraft routing problems. We apply the proposed framework
to the UARX Space OSSIE mission: we are able to thoroughly explore the mission design
space, finding optimal tours and trajectories for a wide variety of mission scenarios.
Mission
Orbit Solutions to Simplify Injection and Exploration (OSSIE) is a modular Orbit Transfer Vehicle by UARX Space, designed to deliver multiple payloads—including PocketQubes, CubeSats, and small satellites—to Low Earth Orbit. It operates using four Dawn Aerospace B20 bi-propellant thrusters, and is capable of multi-revolution impulsive manoeuvers. In this work we present the guidance system developed for OSSIE. The goal of the guidance system is to determine strictly feasible trajectories that minimize fuel consumption for multiple payload deployment missions, accounting for all factors that impact trajectory feasibility and cost. Most importantly: gravity gradient perturbations, the impact of mass deployment sequence on propellant usage, and constraints on insertion and decommissioning orbits.
UARX Space OSSIE Orbit Transfer Vehicle. Refer to the official UARX Space OSSIE information page for up to date information and commercial operations. Credit: UARX Space.
Guidance Framework
We propose a modular solver architecture with a three-stage pipeline: heuristic optimization, trajectory re-optimization, and verification. The heuristic optimization stage determines optimal target sequences, integrating arbitrary hand-crafted and advanced solvers—exact or learned—through distance-based permutation sampling. The trajectory re-optimization stage refines these sequences to generate feasible and near-optimal trajectories, and the verification stage ensures compliance with mission requirements.
STSP solver architecture. In black: complex components (integrated using standardized interfaces) the internal structure of which is out of the scope of this diagram.
Neural Combinatorial Optimization for Multi-Rendezous Spacecraft Guidance
An attention-based routing policy was implemented for the OSSIE mission’s Space Traveling Salesman Problem. The policy consists of an encoder-decoder network first introduced by Kool et al. (2015) and is trained via Reinforcement Learning (RL). The RL4CO Neural Combinatorial Optimization library was used to implement and train the policy using the REINFORCE, Advantage Actor-Critic, and Proximal Policy Optimization RL algorithms on 100,000 ten-transfer mission scenarios.
Architecture of the autoregressive, attention-based policy used in this work. The encoder comprises a fully connected network and a Graph Attention Network (GAT) with a feedforward layer. Edge embeddings are omitted as the STSP graph is fully connected. The decoder constructs at each step a context embedding $\mathbf{Q}$ used as the query for the Pointer Network (PN) attention mechanism. This diagram is based on the general RL4CO policy architecture diagram by Berto et al. (2024). Refer to Kool et al. (2015) and Vinyals, Fortunato and Jaitly (2017) for more information about the internal structure of the GAT and PN, respectively.
REINFORCE provided the best performance, achieving a mean optimality gap of 3.02% compared to heuristic solutions when employing Beam Search. While this performance is significant, it may not directly generalize to fully dynamic STSPs due to the near-static nature of the problem without RAAN targeting. However, the results are promising for future research on spacecraft autonomy in multi-rendezvous missions using Neural Combinatorial Optimization methods.
Validation reward curve (expressed as a mean optimality gap with respect to the solutions obtained using Heuristic Combinatorial Optimization) with REINFORCE, A2C and PPO.
Mission Analysis
A Monte Carlo analysis of 5,000 mission scenarios was performed using OSSIE’s nominal payload list. The study concluded that fuel consumption and mission cost are mainly influenced by the number of payload bundles and the inclination range. In all cases, OSSIE is capable of completing its mission and decommissioning sequence.
The Sequential Convex Programming (SCP) algorithm was validated for trajectory re-optimization, successfully adapting transfer manoeuvers to spacecraft constraints with minimal impact on injection errors or $\Delta V$. In non-coplanar scenarios, SCP reduced propellant consumption by 8.5% compared to the combinatorial approximation, although complete eccentricity elimination was not achieved.
Verification in a high-fidelity simulator confirmed that the optimized trajectories meet mission requirements. Additional $\Delta V$ was observed in simulations due to pointing correction manoeuvers, as the spacecraft uses Dawn Aerospace B1 thrusters for attitude control. Total transfer cost remained feasible for OSSIE.
Mission cost in fuel mass, delta V and TOF, as a function of number of bundles, inclination range and standard deviation, and semi-major axis range and standard deviation
Conclusion
An optimization framework for the Multi-target Rendezvous problem has been developed and applied to the UARX Space OSSIE mission. It determines optimal target sequences and near fuel-optimal trajectories, integrating a Reinforcement Learning-trained Attention-based routing policy. The framework accommodates OSSIE’s propulsion requirements, adapts to mission constraints, and supports future extensions.
Acknowledgements
This work is the result of a collaborative effort between the Delft University of Technology and SENER Aerospace & Defence. We would like to thank Marc Naeije from TU Delft for his endorsement of the project and for his keen supervision and advice. We would like to extend our gratitude to Mercedes Ruiz for making this research possible at Sener Aerospace & Defence. Lastly, we thank UARX Space for entrusting us with OSSIE’s maiden flight, the first of many missions to come. Ad astra.
BibTeX
@inproceedings{lopez_rivera_design_2024,title={Design and {Optimization} of {Multi}-{Rendezvous} {Maneuvres} based on {Reinforcement} {Learning} and {Convex} {Optimization}},url={https://dl.iafastro.directory/event/IAC-2024/paper/87909/},booktitle={Proceedings of the 75th {International} {Astronautical} {Congress}},publisher={International Astronautical Federation},author={López Rivera, Antonio and Marcovaldi, Lucrezia and Ramírez Sánchez, Jesús Fernando and Alex, Cuenca and David, Bermejo},year={2024},pages={18},}