Sotiris Anagnostidis

About Me

I am a 4th year Ph.D. student in the Data Analytics Lab in ETH Zürich, supervised by Thomas Hofmann. I am interested in understanding how models learn from data and how we can make them more efficient and robust. During my Ph.D., I have completed internships at Google DeepMind and Meta GenAI. Prior to my Ph.D., I completed my M.Sc. in Data Science at ETH Zürich and a Diploma in Computer Engineering from the National Technical University of Athens.

News

April 2025

I will be joining Anthropic to work on multi-modal pretraining!

April 2025

Our paper got a Highlight and our paper got an oral in CVPR.

March 2025

Our paper on speculative decoding got an Oral in ICLR, see you in Singapore!

July 2024

Our paper on sycophancy is accepted at COLM, see you in Philadelphia!

June 2024

I will be joining Meta GenAI until mid November. Going to be working on topics around VLM (inference) efficiency!

May 2024

I will be presenting in ICML our papers on navigating scaling laws (Spotlight) and concept guidance in LLMs.

Jan 2024

I will be presenting in ICLR our papers on fusing transformer models and meta-pruning (Spotlight).

Sep 2023

I will be presenting in NeurIPS our papers on dynamic context pruning (Spotlight), OpenAssistant conversations (Oral) and scaling MLPs!

Nov 2023

I will be joining Google DeepMind until the end of the year, working on personalized evaluation and finetuning of VLMs.

Aug 2023

I will be presentig our paper on using RL for dynamic graph predictions in ICCV.

Aug 2023

Our paper using CLIP for 3D Scenes, got an Oral in BMVC!

May 2023

Our paper on distillation will be presented in ICML.

Featured Publications

Navigating Scaling Laws: Compute Optimality in Adaptive Model Training paper

We explore adaptive training methods for deep learning models, challenging the traditional static model paradigm that follows fixed neural scaling laws. By allowing models to change their "shape" during training, we propose to optimally traverses scaling laws, reducing computational resource requirements for training. We do this for a variety of "shapes" and models across different modalities.

Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers paper

We introduce a dynamic context pruning method for autoregressive Transformers to improve efficiency and interpretability when processing long sequences. We learn to prune uninformative tokens dynamically during inference, reducing memory and computational costs while maintaining model expressiveness. The pruning mechanism, which is controlled by a sparsity parameter, can be easily integrated into pre-trained models via fine-tuning.

Scaling MLPs: A Tale of Inductive Bias paper

We investigates the limits of multi-layer perceptrons, addressing the gap between their theoretical prominence and practical underuse in deep learning. We demonstrate that MLPs, despite lacking vision-specific inductive biases, achieve strong performance when scaled effectively, highlighting that inductive bias can be compensated with sufficient compute and data.

OpenAssistant Conversations - Democratizing Large Language Model Alignment paper

We address the challenge of aligning large language models (LLMs) with human preferences to enhance usability and accessibility. While alignment techniques like supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) improve model performance, they rely on expensive, proprietary high-quality human feedback data. To democratize alignment research, we introduce the OpenAssistant Conversations dataset, and train large models to evaluate its effectiveness.

Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse paper

We investigate the phenomenon of rank collapse in Transformers, where token representations become highly aligned or degenerate at initialization, leading to vanishing gradients. We identify this issue as a significant barrier to training, particularly in deeper networks. By analyzing the underlying causes, we propose a depth-dependent scaling of residual branches to mitigate rank collapse and stabilize token representations.

Contact

You can reach me at: