Tiansheng Wen
email: neilwen987 _at_ gmail.com

| CV | Google Scholar | Github | X | Email |

Howdy! Welcome to my home page. I am a second-year M.S. student at Xidian University, advised by Prof. Bo Chen. Concurrently, I serve as a Research Intern at Stony Brook University, working with Prof. Chenyu You. Prior to my graduate studies, I received my B.S. degree from Xidian University in 2023.

🔥 I am actively seeking for a PhD in 26Fall in US.

Please feel free to reach out to me via email if you believe I am a good fit for your research team. I welcome the opportunity for further discussion! Please see my CV for more details.

🧐 Research Interests

My primary research goal is to develop scalable, reliable and efficient methods for machine learning and generative AI, mainly focus at:

  • Bayesian methods for disentangled representations and uncertainty estimation
  • Alignment and safety of Foundation models, including LLMs, VLMs, and diffusion models

In addition, I am also highly interested in:

  • 📚 Memorization in large models
  • 🔄 Self-consuming/self-improving loops
  • 🤖 Agent learning with Foundation models

If you share the same research interests, feel free to reach out or add my Wechat

 🚀🚀 News
  • [03/2025] Code for our paper CSR has been released, and we were invited to publish the model on Hugging Face! ⚙️⚙️
  • [02/2025] One paper was accepted by CVPR 2025! 🎉🎉
  • [07/2024] Our paper HICE-Score was accepted by ACM MM 2024! 🎉🎉
  Publications
sym

Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
Tiansheng Wen*, Yifei Wang*, Zequn Zeng, Zhong Peng, Yudi Su, Xinyang Liu, Bo Chen, Hongwei Liu, Stefanie Jegelka, Chenyu You
arXiv, 2025

abstract | paper | code |

Many large-scale systems rely on high-quality deep representations (embeddings) to facilitate tasks like retrieval, search, and generative modeling. Matryoshka Representation Learning (MRL) recently emerged as a solution for adaptive embedding lengths, but it requires full model retraining and suffers from noticeable performance degradations at short lengths. In this paper, we show that sparse coding offers a compelling alternative for achieving adaptive representation with minimal overhead and higher fidelity. We propose Contrastive Sparse Representation (CSR), a method that sparsifies pre-trained embeddings into a high-dimensional but selectively activated feature space. By leveraging lightweight autoencoding and task-aware contrastive objectives, CSR preserves semantic quality while allowing flexible, cost-effective inference at different sparsity levels. Extensive experiments on image, text, and multimodal benchmarks demonstrate that CSR consistently outperforms MRL in terms of both accuracy and retrieval speed-often by large margins-while also cutting training time to a fraction of that required by MRL. Our results establish sparse coding as a powerful paradigm for adaptive representation learning in real-world applications where efficiency and fidelity are both paramount.

sym

Contrastive Factor Analysis
Zhibin Duan*, Tiansheng Wen*, Yifei Wang, Chen Zhu, Bo Chen, Mingyuan Zhou
arXiv, 2024

abstract | paper |

Factor analysis, often regarded as a Bayesian variant of matrix factorization, offers superior capabilities in capturing uncertainty, modeling complex dependencies, and ensuring robustness. As the deep learning era arrives, factor analysis is receiving less and less attention due to their limited expressive ability. On the contrary, contrastive learning has emerged as a potent technique with demonstrated efficacy in unsupervised representational learning. While the two methods are different paradigms, recent theoretical analysis has revealed the mathematical equivalence between contrastive learning and matrix factorization, providing a potential possibility for factor analysis combined with contrastive learning. Motivated by the interconnectedness of contrastive learning, matrix factorization, and factor analysis, this paper introduces a novel Contrastive Factor Analysis framework, aiming to leverage factor analysis's advantageous properties within the realm of contrastive learning. To further leverage the interpretability properties of non-negative factor analysis, which can learn disentangled representations, contrastive factor analysis is extended to a non-negative version. Finally, extensive experimental validation showcases the efficacy of the proposed contrastive (non-negative) factor analysis methodology across multiple key properties, including expressiveness, robustness, interpretability, and accurate uncertainty estimation.

sym

HICEScore: A Hierarchical Metric for Image Captioning Evaluation
Zequn Zeng, Jianqiao Sun, Hao Zhang, Tiansheng Wen, Yudi Su, Yan Xie, Zhengjue Wang, Bo Chen
ACM MM, 2024

abstract | paper | code |

Image captioning evaluation metrics can be divided into two categories, reference-based metrics and reference-free metrics. However, reference-based approaches may struggle to evaluate descriptive captions with abundant visual details produced by advanced multimodal large language models, due to their heavy reliance on limited human-annotated references. In contrast, previous reference-free metrics have been proven effective via CLIP cross-modality similarity. Nonetheless, CLIP-based metrics, constrained by their solution of global image-text compatibility, often have a deficiency in detecting local textual hallucinations and are insensitive to small visual objects. Besides, their single-scale designs are unable to provide an interpretable evaluation process such as pinpointing the position of caption mistakes and identifying visual regions that have not been described. To move forward, we propose a novel reference-free metric for image captioning evaluation, dubbed Hierarchical Image Captioning Evaluation Score (HICE-S). By detecting local visual regions and textual phrases, HICE-S builds an interpretable hierarchical scoring mechanism, breaking through the barriers of the single-scale structure of existing reference-free metrics. Comprehensive experiments indicate that our proposed metric achieves the SOTA performance on several benchmarks, outperforming existing reference-free metrics like CLIP-S and PAC-S, and reference-based metrics like METEOR and CIDEr. Moreover, several case studies reveal that the assessment process of HICE-S on detailed captions closely resembles interpretable human judgments. Our code is available at https://github.com/joeyz0z/HICE.

sym

A Non-negative VAE: the Generalized Gamma Belief Network
Zhibin Duan, Tiansheng Wen, Muyao Wang, Bo Chen, Mingyuan Zhou
arXiv, 2024

abstract | paper |

The gamma belief network (GBN), often regarded as a deep topic model, has demonstrated its potential for uncovering multi-layer interpretable latent representations in text data. Its notable capability to acquire interpretable latent factors is partially attributed to sparse and non-negative gamma-distributed latent variables. However, the existing GBN and its variations are constrained by the linear generative model, thereby limiting their expressiveness and applicability. To address this limitation, we introduce the generalized gamma belief network (Generalized GBN) in this paper, which extends the original linear generative model to a more expressive non-linear generative model. Since the parameters of the Generalized GBN no longer possess an analytic conditional posterior, we further propose an upward-downward Weibull inference network to approximate the posterior distribution of the latent variables. The parameters of both the generative model and the inference network are jointly trained within the variational inference framework. Finally, we conduct comprehensive experiments on both expressivity and disentangled representation learning tasks to evaluate the performance of the Generalized GBN against state-of-the-art Gaussian variational autoencoders serving as baselines.

  Professional Activity
  • Journal Reviewer: TNNLS

Template from this awesome website.