Our goal is to develop a robot foundation model that can perform tasks across diverse environments.
Throughout my career, I have focused on advancing robot motion by integrating control, perception, and learning. My current research focuses on enabling robots to achieve faster, more accurate, and more adaptable motion.
As a university student, I worked on control technologies for machine tools and industrial equipment through industry–academia collaborative research. The objective was clear: to improve productivity by understanding and refining the principles that allow machines to behave precisely as designed. After joining Toyota Central R&D Labs., my research expanded to more general-purpose robotic systems. I explored how information such as joint angles, applied forces, and wheel–body interactions should be modeled and controlled. Building on a foundation in control theory, I gradually shifted toward research with broader applicability.
Around 2015, a proposal from my supervisor encouraged me to explore AI research. That opportunity led me to the U.S. West Coast, where I began working at the forefront of a rapidly evolving field.
A major turning point in my career was my time overseas. I spent two years at Stanford and, starting in 2022, four years at UC Berkeley (University of California, Berkeley).
Looking back, I feel fortunate to have conducted research abroad during a period of transformative progress in AI. Experiencing that momentum firsthand profoundly shaped my development as a researcher.
Around 2017, when I was at Stanford, Generative Adversarial Networks (GANs), which train competing generator and discriminator models, were attracting attention. Their applications quickly expanded beyond image generation to image completion, domain translation, and simulation generation, heightening recognition that AI could autonomously generate representations close to the real world. In computer vision, deep learning dramatically advanced both image understanding and image generation, accelerating the adoption of simulation environments and vision-based control methods in robotics.
By 2021–2022, just before my move to UC Berkeley, diffusion models, which learn to generate images by progressively removing noise, had begun to attract attention. Their ability to generate images of unprecedented quality marked a major turning point in generative AI research. Researchers were also exploring applications beyond images, including video generation, 3D content generation, and robot action generation. This growing effort to connect generation and action became one of the driving forces behind the rapid emergence of robot learning as a major research field.
At the same time, foundation models such as large language models (LLMs) were gaining momentum, and their influence quickly reached robotics. Vision-Language-Action (VLA) models that generate actions directly from visual input and language instructions are now driving the next wave of progress in robot learning.
At UC Berkeley, amid the growing importance of AI in robotics, I focused on robot learning research that links visual perception with action generation. In particular, I worked on methods that learn end-to-end mappings from high-dimensional sensory inputs, such as camera images, directly to robot control.
The key challenge was not simply using neural networks to predict actions. Rather, it was how to incorporate control-oriented structures—such as robot kinematics and dynamics—into the learning process. Drawing on my background in control theory, I explored learning methods that enable robots to operate robustly and reliably in real-world environments, emphasizing model-based approaches, stability, and reproducibility.
Through this work, I came to appreciate that robot learning is far more than an application of AI. It is an inherently interdisciplinary field in which control, perception, and learning must be integrated at a high level to achieve meaningful progress.

- 拡大
- Dr. Hirose with a robot used for VLA research at UC Berkeley.
My experience abroad fundamentally changed both my perspective on research and my understanding of what it means to be a researcher. One of the individuals who influenced me most was Associate Professor Sergey Levine at UC Berkeley.
Professor Levine set an extraordinary example through both his work ethic and his commitment to mentoring students. His approach shaped the culture of the entire laboratory. As soon as one paper was completed, attention shifted to the next challenge. They tackled difficult and demanding topics that others often avoided head-on. Through this environment, I learned that research is, by its nature, highly competitive, and that confronting challenging problems rather than avoiding them is often what leads to meaningful breakthroughs.
In the United States, even securing research opportunities is highly competitive. Unlike environments where research opportunities are relatively accessible, students must demonstrate their abilities early in order to earn a place at the table. Witnessing this firsthand reinforced the importance of understanding one’s strengths and communicating them effectively.
What I value most as a researcher is understanding what uniquely differentiates me. In Japan, specialization is often regarded as the most important asset, but specialization alone is not enough to expand research horizons. My background in control is one of my greatest strengths, yet relying solely on that foundation would eventually limit the scope of my work. It is important to maintain a strong core expertise while continuously extending into new areas.
Additionally, the ability to collaborate across disciplines and build a network of colleagues are valuable soft skills. During my time at Stanford, I had the opportunity to work with a highly accomplished researcher who was adept at motivating others to contribute to his research. He demonstrated his vision, inspiring others to collaborate, and as a result, his research advanced steadily. I learned the importance of building and expanding networks while working abroad.
After returning to Japan for the first time in four years in March 2026, I became based at the Tokyo campus of Toyota Central R&D Labs., Inc. There, I established a specialized team dedicated to integrating AI into robotics. Our research focuses on robotic learning, particularly Vision-Language-Action (VLA) models, which generate actions directly from visual inputs captured by cameras and language-based instructions.
Our goal is to realize robots that can perform across diverse environments, including factories, restaurants, homes, and offices. Developing the robot foundation models that will make this possible has become the central focus of my research.
One of the most compelling aspects of robot learning is that robots can autonomously acquire experience through interaction with their environment. Even if a robot initially achieves 70-80% performance, it can continue learning from those interactions and steadily improve over time. I'm on a mission to create robots that can enhance their abilities through self-learning, empowering them to tackle a wide range of tasks in various environments with the same level of flexibility as humans. Ultimately, I'm passionate about making these capabilities widely applicable across many different types of robots.
Robot learning is a field undergoing rapid global progress. I look forward to working alongside talented researchers who share this vision and contributing to the next generation of robotics.