之前一直想过要阅读各个重要会议的有趣论文，但是一直没有想过具体要怎么做。现在想来还是要先列个单子出来。

具体打算的话，还是打算先以 Oral 论文为主。这样，这个总的 list 主要就用来列举论文有哪些，然后是每篇论文的题目、领域、摘要和作者等补充信息。摘要会把中文也扔上去。因为主要是 Oral 的论文，后续的其他论文可能等到看到再贴到上面，因此所有的 Oral 论文不再单独标注。

所以这就是个体力活嘛。先干好再慢慢来。那就开始吧！

# Test of Time Award

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition, Jeffrey Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, Trevor Darrell.

摘要

Abstract: We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be re- purposed to novel generic tasks. Our generic tasks may differ significantly from the originally trained tasks and there may be insufficient la- beled or unlabeled data to conventionally train or adapt a deep architecture to the new tasks. We in- vestigate and visualize the semantic clustering of deep convolutional features with respect to a va- riety of such tasks, including scene recognition, domain adaptation, and fine-grained recognition challenges. We compare the efficacy of relying on various network levels to define a fixed fea- ture, and report novel results that significantly outperform the state-of-the-art on several impor- tant vision challenges. We are releasing DeCAF, an open-source implementation of these deep convolutional activation features, along with all associated network parameters to enable vision researchers to be able to conduct experimenta- tion with deep representations across a range of visual concept learning paradigms.
摘要翻译：我们评估了在大量固定的物体识别任务上以完全监督的方式训练的深度卷积网络的激活中提取的特征是否可以重新用于新的通用任务。我们的通用任务可能与最初训练的任务有很大不同，并且可能没有足够的标记或未标记数据来传统地训练或调整深度架构以适应新任务。我们研究并可视化了深度卷积特征在各种此类任务中的语义聚类，包括场景识别、领域自适应和细粒度识别挑战。我们比较了依赖不同网络级别来定义固定特征的有效性，并报告了在几个重要的视觉挑战中明显优于最先进技术的新结果。我们正在发布 DeCAF，这是这些深度卷积激活特征的开源实现，以及所有相关的网络参数，以使视觉研究人员能够在一系列视觉概念学习范式中使用深度表示进行实验。

贾扬清等人的作品，是 Caffe 的前身。

# Best Paper

Debating with More Persuasive LLMs Leads to More Truthful Answers, Akbir Khan, John Hughes, Dan Valentine, Laura Ruis, Kshitij Sachan, Ansh Radhakrishnan, Edward Grefenstette, Samuel Bowman, Tim Rocktäschel, Ethan Perez.
摘要
Abstract: Common methods for aligning large language models (LLMs) with desired behaviour heavily rely on human-labelled data. However, as models grow increasingly sophisticated, they will surpass human expertise, and the role of human evaluation will evolve into non-experts overseeing experts. In anticipation of this, we ask: can weaker models assess the correctness of stronger models? We investigate this question in an analogous setting, where stronger models (experts) possess the necessary information to answer questions and weaker models (non-experts) lack this information. The method we evaluate is debate, where two LLM experts each argue for a different answer, and a non-expert selects the answer. We find that debate consistently helps both non-expert models and humans answer questions, achieving 76% and 88% accuracy respectively (naive baselines obtain 48% and 60%). Furthermore, optimising expert debaters for persuasiveness in an unsupervised manner improves non-expert ability to identify the truth in debates. Our results provide encouraging empirical evidence for the viability of aligning models with debate in the absence of ground truth.
将大型语言模型 (LLM) 与期望行为对齐的常用方法严重依赖于人工标记的数据。然而，随着模型变得越来越复杂，它们将超越人类的专业知识，而人类评估的角色将演变为非专家监督专家。为了预见到这一点，我们问：较弱的模型能否评估较强模型的正确性？我们在类似的环境中调查了这个问题，其中较强的模型（专家）拥有回答问题的必要信息，而较弱的模型（非专家）缺乏这些信息。我们评估的方法是辩论，其中两个 LLM 专家各自争论不同的答案，然后由非专家选择答案。我们发现辩论始终有助于非专家模型和人类回答问题，分别达到 76% 和 88% 的准确率（原始的基线获得 48% 和 60%）。此外，以无监督的方式优化专家辩论者的说服力可以提高非专家在辩论中识别真相的能力。我们的结果为在缺乏基本事实的情况下将模型与辩论相结合的可行性提供了令人鼓舞的实证证据。
Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo, Stephen Zhao, Rob Brekelmans, Alireza Makhzani, Roger Grosse.
摘要
Abstract: Numerous capability and safety techniques of Large Language Models (LLMs), including RLHF, automated red-teaming, prompt engineering, and infilling, can be cast as sampling from an unnormalized target distribution defined by a given reward or potential function over the full sequence. In this work, we leverage the rich toolkit of Sequential Monte Carlo (SMC) for these probabilistic inference problems. In particular, we use learned twist functions to estimate the expected future value of the potential at each timestep, which enables us to focus inference-time computation on promising partial sequences. We propose a novel contrastive method for learning the twist functions, and establish connections with the rich literature of soft reinforcement learning. As a complementary application of our twisted SMC framework, we present methods for evaluating the accuracy of language model inference techniques using novel bidirectional SMC bounds on the log partition function. These bounds can be used to estimate the KL divergence between the inference and target distributions in both directions. We apply our inference evaluation techniques to show that twisted SMC is effective for sampling undesirable outputs from a pretrained model (a useful component of harmlessness training and automated red-teaming), generating reviews with varied sentiment, and performing infilling tasks.
摘要翻译：大型语言模型 (LLM) 的众多功能和安全技术，包括 RLHF、自动红队、快速工程和填充，都可以视为从给定奖励或潜在函数在整个序列上定义的非正则化目标分布中进行采样。在这项工作中，我们利用丰富的顺序蒙特卡罗 (SMC) 工具包来解决这些概率推理问题。具体来说，我们使用学习到的扭曲函数来估计每个时间步的预期未来潜力值，这使我们能够将推理时间计算集中在有希望的部分序列上。我们提出了一种学习扭曲函数的新型对比方法，并与丰富的软强化学习文献建立了联系。作为我们扭曲的 SMC 框架的补充应用，我们提出了使用对数分区函数的新型双向 SMC 界限来评估语言模型推理技术准确性的方法。这些界限可用于估计推理和目标分布在两个方向上的 KL 散度。我们应用推理评估技术来证明扭曲的 SMC 可以有效地从预训练模型（无害训练和自动红队的有用组成部分）中采样不良输出，生成具有不同情绪的评论，并执行填充任务。
Stealing part of a production language model, Nicholas Carlini, Daniel Paleka, Krishnamurthy Dj Dvijotham, Thomas Steinke, Jonathan Hayase, A. Feder Cooper, Katherine Lee, Matthew Jagielski, Milad Nasr, Arthur Conmy, Itay Yona, Eric Wallace, David Rolnick, Florian Tramèr.
摘要
Abstract: We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under $20 USD, our attack extracts the entire projection matrix of OpenAI's Ada and Babbage language models. We thereby confirm, for the first time, that these black-box models have a hidden dimension of 1024 and 2048, respectively. We also recover the exact hidden dimension size of the gpt-3.5-turbo model, and estimate it would cost under $2,000 in queries to recover the entire projection matrix. We conclude with potential defenses and mitigations, and discuss the implications of possible future work that could extend our attack.
摘要翻译：我们介绍了第一个模型窃取攻击，该攻击从黑盒生产语言模型（如 OpenAI 的 ChatGPT 或 Google 的 PaLM-2）中提取精确、非平凡的信息。具体来说，在给定典型 API 访问的情况下，我们的攻击会恢复 Transformer 模型的嵌入投影层（最多对称）。不到 20 美元，我们的攻击就可以提取 OpenAI 的 Ada 和 Babbage 语言模型的整个投影矩阵。因此，我们首次确认这些黑盒模型的隐藏维度分别为 1024 和 2048。我们还恢复了 gpt-3.5-turbo 模型的精确隐藏维度大小，并估计恢复整个投影矩阵的查询成本不到 2,000 美元。我们总结了潜在的防御和缓解措施，并讨论了可能扩展我们攻击的未来工作的影响。
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis, Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Robin Rombach.
摘要
Abstract: Diffusion models create data from noise by inverting the forward paths of data towards noise and have emerged as a powerful generative modeling technique for high-dimensional, perceptual data such as images and videos. Rectified flow is a recent generative model formulation that connects data and noise in a straight line. Despite its better theoretical properties and conceptual simplicity, it is not yet decisively established as standard practice. In this work, we improve existing noise sampling techniques for training rectified flow models by biasing them towards perceptually relevant scales. Through a large-scale study, we demonstrate the superior performance of this approach compared to established diffusion formulations for high-resolution text-to-image synthesis. Additionally, we present a novel transformer-based architecture for text-to-image generation that uses separate weights for the two modalities and enables a bidirectional flow of information between image and text tokens, improving text comprehension, typography, and human preference ratings. We demonstrate that this architecture follows predictable scaling trends and correlates lower validation loss to improved text-to-image synthesis as measured by various metrics and human evaluations. Our largest models outperform state-of-the-art models. Stability AI is considering making experimental data, code, and model weights publicly available.
摘要翻译：扩散模型通过将数据的前向路径反转为噪声来从噪声中创建数据，并已成为图像和视频等高维感知数据的一种强大的生成建模技术。整流是一种最近的生成模型公式，它将数据和噪声以直线连接起来。尽管它具有更好的理论特性和概念简单性，但它尚未被确定为标准做法。在这项工作中，我们改进了现有的噪声采样技术，通过将它们偏向感知相关的尺度来训练整流模型。通过一项大规模研究，我们证明了这种方法与现有的高分辨率文本到图像合成扩散公式相比具有更优越的性能。此外，我们提出了一种用于文本到图像生成的新型基于变压器的架构，它对两种模态使用单独的权重，并实现图像和文本标记之间的双向信息流，从而改善了文本理解、排版和人类偏好评级。我们证明这种架构遵循可预测的扩展趋势，并将较低的验证损失与改进的文本到图像合成相关联，这是通过各种指标和人工评估来衡量的。我们最大的模型比最先进的模型表现更好。Stability AI 正在考虑公开实验数据、代码和模型权重。
Information Complexity of Stochastic Convex Optimization: Applications to Generalization, Memorization, and Tracing, Idan Attias, Gintare Karolina Dziugaite, Mahdi Haghifam, Roi Livni, Daniel Roy.
摘要
Abstract: In this work, we investigate the interplay between memorization and learning in the context of stochastic convex optimization (SCO). We define memorization via the information a learning algorithm reveals about its training data points. We then quantify this information using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou (2020). Our main result is a precise characterization of the tradeoff between the accuracy of a learning algorithm and its CMI, answering an open question posed by Livni (2023). We show that, in the $L^2$ Lipschitz--bounded setting and under strong convexity, every learner with an excess error $\epsilon$ has CMI bounded below by $\Omega(1/\epsilon^2)$ and $\Omega(1/\epsilon)$ , respectively. We further demonstrate the essential role of memorization in learning problems in SCO by designing an adversary capable of accurately identifying a significant fraction of the training samples in specific SCO problems. Finally, we enumerate several implications of our results, such as a limitation of generalization bounds based on CMI and the incompressibility of samples in SCO problems.
摘要翻译：在这项工作中，我们研究了随机凸优化 (SCO) 背景下记忆和学习之间的相互作用。我们通过学习算法揭示的有关其训练数据点的信息来定义记忆。然后，我们使用 Steinke 和 Zakynthinou (2020) 提出的条件互信息 (CMI) 框架量化此信息。我们的主要结果是精确表征了学习算法的准确性与其 CMI 之间的权衡，回答了 Livni (2023) 提出的一个开放性问题。我们表明，在 $L^2$ Lipschitz 有界设置和强凸性下，每个具有过量误差 $\epsilon$ 的学习者的 CMI 分别低于 $\Omega(1/\epsilon^2)$ 和 $\Omega(1/\epsilon)$ 。我们通过设计一个能够准确识别特定 SCO 问题中相当一部分训练样本的对手，进一步证明了记忆在 SCO 学习问题中的重要作用。最后，我们列举了结果的几个含义，例如基于 CMI 的泛化界限的限制和 SCO 问题中样本的不可压缩性。
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution, Aaron Lou, Chenlin Meng, Stefano Ermon.
摘要
Abstract: Despite their groundbreaking performance for many generative modeling tasks, diffusion models have fallen short on discrete data domains such as natural language. Crucially, standard diffusion models rely on the well-established theory of score matching, but efforts to generalize this to discrete structures have not yielded the same empirical gains. In this work, we bridge this gap by proposing score entropy, a novel loss that naturally extends score matching to discrete spaces, integrates seamlessly to build discrete diffusion models, and significantly boosts performance. Experimentally, we test our Score Entropy Discrete Diffusion models (SEDD) on standard language modeling tasks. For comparable model sizes, SEDD beats existing language diffusion paradigms (reducing perplexity by $25$ - $75$ %) and is competitive with autoregressive models, in particular outperforming GPT-2. Furthermore, compared to autoregressive mdoels, SEDD generates faithful text without requiring distribution annealing techniques like temperature scaling (around $6$ - $8 \times$ better generative perplexity than un-annealed GPT-2), can trade compute and quality (similar quality with $32\times$ fewer network evaluations), and enables controllable infilling (matching nucleus sampling quality while enabling other strategies besides left to right prompting).
摘要翻译：尽管扩散模型在许多生成建模任务中表现出了突破性的表现，但它们在自然语言等离散数据领域却有所欠缺。至关重要的是，标准扩散模型依赖于成熟的分数匹配理论，但将其推广到离散结构的努力并没有产生相同的经验收益。在这项工作中，我们通过提出分数熵来弥补这一差距，这是一种新颖的损失，可以自然地将分数匹配扩展到离散空间，无缝集成以构建离散扩散模型，并显着提高性能。在实验上，我们在标准语言建模任务上测试了我们的分数熵离散扩散模型 (SEDD)。对于可比的模型大小，SEDD 优于现有的语言扩散范式（将困惑度降低了 25-75%），并且与自回归模型具有竞争力，尤其是优于 GPT-2。此外，与自回归模型相比，SEDD 无需温度缩放等分布退火技术即可生成忠实的文本（比未退火的 GPT-2 生成困惑度好约 6-8 倍），可以权衡计算和质量（相似的质量，而网络评估次数减少 32 倍），并且能够实现可控填充（匹配核心采样质量，同时实现除了从左到右提示之外的其他策略）。
Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining, Florian Tramer, Gautam Kamath, Nicholas Carlini.
摘要
Abstract: The performance of differentially private machine learning can be boosted significantly by leveraging the transfer learning capabilities of non-private models pretrained on large public datasets. We critically review this approach. We primarily question whether the use of large Web-scraped datasets should be viewed as differential-privacy-preserving. We further scrutinize whether existing machine learning benchmarks are appropriate for measuring the ability of pretrained models to generalize to sensitive domains. Finally, we observe that reliance on large pretrained models may lose other forms of privacy, requiring data to be outsourced to a more compute-powerful third party.
摘要翻译：通过利用在大型公共数据集上预训练的非隐私模型的迁移学习能力，可以显著提高差异隐私机器学习的性能。我们批判性地审查了这种方法。我们主要质疑使用大型网络抓取数据集是否应被视为差异隐私保护。我们进一步审查现有的机器学习基准是否适合衡量预训练模型推广到敏感领域的能力。最后，我们观察到对大型预训练模型的依赖可能会失去其他形式的隐私，需要将数据外包给计算能力更强的第三方。
Genie: Generative Interactive Environments, Jake Bruce, Michael Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Bechtle, Feryal Behbahani, Stephanie Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando de Freitas, Satinder Singh, Tim Rocktäschel.
摘要
Abstract: We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.
摘要翻译：我们介绍了 Genie，这是第一个以无监督方式从未标记的互联网视频中训练的生成式交互式环境。该模型可以生成各种动作可控的虚拟世界，这些虚拟世界通过文本、合成图像、照片甚至草图进行描述。在 11B 参数下，Genie 可以被视为基础世界模型。它由时空视频标记器、自回归动力学模型和简单且可扩展的潜在动作模型组成。Genie 使用户能够逐帧在生成的环境中采取行动，尽管在训练时没有任何真实动作标签或世界模型文献中通常发现的其他领域特定要求。此外，由此产生的学习到的潜在动作空间有助于训练代理模仿从未见过的视频中的行为，为训练未来的通用智能体开辟了道路。
VideoPoet: A Large Language Model for Zero-Shot Video Generation, Dan Kondratyuk, Lijun Yu, Xiuye Gu, Jose Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh N Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Joshua V Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam, Ming-Hsuan Yang, Irfan Essa, Huisheng Wang, David Ross, Bryan Seybold, Lu Jiang.
摘要
Abstract: We present VideoPoet, a language model capable of synthesizing high-quality video from a large variety of conditioning signals. VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs -- including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and task-specific adaptation. During pretraining, VideoPoet incorporates a mixture of multimodal generative objectives within an autoregressive Transformer framework. The pretrained LLM serves as a foundation that can be adapted for a range of video generation tasks. We present empirical results demonstrating the model's state-of-the-art capabilities in zero-shot video generation, specifically highlighting the ability to generate high-fidelity motions. Project page: http://sites.research.google/videopoet/
摘要翻译：我们推出了 VideoPoet，这是一种能够从各种调节信号中合成高质量视频的语言模型。VideoPoet 采用仅解码器的转换器架构，可处理多模态输入 —— 包括图像、视频、文本和音频。训练协议遵循大型语言模型 (LLM) 的协议，包括两个阶段：预训练和特定于任务的适应。在预训练期间，VideoPoet 在自回归 Transformer 框架内结合了多种多模态生成目标。预训练的 LLM 可作为基础，适用于各种视频生成任务。我们展示了实证结果，展示了该模型在零样本视频生成方面的最先进能力，特别突出了生成高保真运动的能力。项目页面：http://sites.research.google/videopoet/

# 机器学习基础理论 (Foundations of Machine Learning)

Position: Embracing Negative Results in Machine Learning, Florian Karl, Malte Kemeter, Gabriel Dax, Paulina Sierak.
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision, Collin Burns, Pavel Izmailov, Jan Kirchner, Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, Jeffrey K Wu.
Position: The Platonic Representation Hypothesis, Minyoung Huh, Brian Cheung, Tongzhou Wang, Phillip Isola.
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape, Juno Kim, Taiji Suzuki.
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks, Liam Collins, Hamed Hassani, Mahdi Soltanolkotabi, Aryan Mokhtari, Sanjay Shakkottai.

# 深度学习 (Deep Learning)

MorphGrower: A Synchronized Layer-by-layer Growing Approach for Plausible Neuronal Morphology Generation, Nianzu Yang, Kaipeng Zeng, Haotian Lu, Yexin Wu, Zexin Yuan, Danni Chen, Shengdian Jiang, Jiaxiang Wu, Yimin Wang, Junchi Yan.
Robustness of Nonlinear Representation Learning, Simon Buchholz, Bernhard Schölkopf.

# 图学习与图神经网络 (Graph Learning and Graph Neural Networks)

LSEnet: Lorentz Structural Entropy Neural Network for Deep Graph Clustering, Li Sun, Zhenhao Huang, Hao Peng, YuJie Wang, Chunyang Liu, Philip Yu.
EquiPocket: an E(3)-Equivariant Geometric Graph Neural Network for Ligand Binding Site Prediction, yang zhang, Zhewei Wei, Ye Yuan, Chongxuan Li, Wenbing Huang.
Expressivity and Generalization: Fragment-Biases for Molecular GNNs, Tom Wollschläger, Niklas Kemper, Leon Hetzel, Johanna Sommer, Stephan Günnemann.
Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models, Songtao Liu, Hanjun Dai, Yue Zhao, Peng Liu.
Pruned Pivot: Correlation Clustering Algorithm for Dynamic, Parallel, and Local Computation Models, Mina Dalirrooyfard, Konstantin Makarychev, Slobodan Mitrovic.
Less is More: on the Over-Globalizing Problem in Graph Transformers, Yujie Xing, Xiao Wang, Yibo Li, Hai Huang, Chuan Shi.

# 强化学习与控制 (Reinforcement Learning and Control)

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study, Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, Yi Wu.

# 计算机视觉 (Computer Vision)

Image Clustering with External Guidance, Yunfan Li, Peng Hu, Dezhong Peng, Jiancheng Lv, Jianping Fan, Xi Peng.
ViP: A Differentially Private Foundation Model for Computer Vision, Yaodong Yu, Maziar Sanjabi, Yi Ma, Kamalika Chaudhuri, Chuan Guo.

# 自然语言处理 (Natural Language Processing)

Debating with More Persuasive LLMs Leads to More Truthful Answers, Akbir Khan, John Hughes, Dan Valentine, Laura Ruis, Kshitij Sachan, Ansh Radhakrishnan, Edward Grefenstette, Samuel Bowman, Tim Rocktäschel, Ethan Perez.
Arrows of Time for Large Language Models, Vassilis Papadopoulos, Jérémie Wenger, Clement Hongler.
Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation, Can Yaras, Peng Wang, Laura Balzano, Qing Qu.
Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion, Yujia Huang, Adishree Ghatare, Yuanzhe Liu, ziniu hu, Qinsheng Zhang, Chandramouli Shama Sastry, Siddharth Gururani, Sageev Oore, Yisong Yue.
APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference, Bowen Zhao, Hannaneh Hajishirzi, Qingqing Cao.
DITTO: Diffusion Inference-Time T-Optimization for Music Generation, Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas Bryan.
Improving Transformers with Dynamically Composable Multi-Head Attention, Da Xiao, Qingye Meng, Shengping Li, xingyuan yuan.
DiJiang: Efficient Large Language Models through Compact Kernelization, Hanting Chen, Liuzhicheng Liuzhicheng, Xutao Wang, Yuchuan Tian, Yunhe Wang.
Fast Timing-Conditioned Latent Audio Diffusion, Zach Evans, CJ Carr, Josiah Taylor, Scott Hawley, Jordi Pons.
Listenable Maps for Audio Classifiers, Francesco Paissan, Mirco Ravanelli, Cem Subakan.

# 多模态与跨领域学习 (Multimodal and Corss-Domain Learning)

Genie: Generative Interactive Environments, Jake Bruce, Michael Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Bechtle, Feryal Behbahani, Stephanie Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando de Freitas, Satinder Singh, Tim Rocktäschel.
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization, Yang Jin, Zhicheng Sun, Kun Xu, Kun Xu, Liwei Chen, Hao Jiang, Quzhe Huang, Chengru Song, Yuliang Liu, Di ZHANG, Yang Song, Kun Gai, Yadong Mu.
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity, Andrew Lee, Xiaoyan Bai, Itamar Pres, Martin Wattenberg, Jonathan K. Kummerfeld, Rada Mihalcea.
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition, Hao Fei, Shengqiong Wu, Wei Ji, Hanwang Zhang, Meishan Zhang, Mong-Li Lee, Wynne Hsu.
VideoPoet: A Large Language Model for Zero-Shot Video Generation, Dan Kondratyuk, Lijun Yu, Xiuye Gu, Jose Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh N Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Joshua V Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam, Ming-Hsuan Yang, Irfan Essa, Huisheng Wang, David Ross, Bryan Seybold, Lu Jiang.

# 时序数据与序列建模 (Temporal Data and Sequence Modeling)

SparseTSF: Modeling Long-term Time Series Forecasting with 1k Parameters, Shengsheng Lin, Weiwei Lin, Wentai Wu, Haojun Chen, Junjie Yang.
Unified Training of Universal Time Series Forecasting Transformers, Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, Doyen Sahoo.
SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention, Romain Ilbert, Ambroise Odonnat, Vasilii Feofanov, Aladin Virmaux, Giuseppe Paolo, Themis Palpanas, Ievgen Redko.

# 元学习与自动化机器学习 (Meta Learning and Automated Machine Learning)

# 人工智能安全 (AI Safety)

Position: A Safe Harbor for AI Evaluation and Red Teaming, Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alex Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Alex Pentland, Arvind Narayanan, Percy Liang, Peter Henderson.
Making Old Things New: A Unified Algorithm for Differentially Private Clustering, Max Dupre la Tour, Monika Henzinger, David Saulpic.
Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining, Florian Tramer, Gautam Kamath, Nicholas Carlini.
Position: Beyond Personhood: Agency, Accountability, and the Limits of Anthropomorphic Ethical Analysis, Jessica Dai.
How Private are DP-SGD Implementations?, Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang.
Private Truly-Everlasting Robust-Prediction, Uri Stemmer.
Position: AI-Powered Autonomous Weapons Risk Geopolitical Instability and Threaten AI Research, Riley Simmons-Edler, Ryan Badman, Shayne Longpre, Kanaka Rajan.
Position: Near to Mid-term Risks and Opportunities of Open-Source Generative AI, Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder de Witt, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Botos Csaba, Fabro Steibel, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Marvin Imperial, Juan Arturo Nolazco Flores, Lori Landay, Matthew T Jackson, Paul Röttger, Phil Torr, Trevor Darrell, Yong Suk Lee, Jakob Foerster.

# 大规模学习与分布式计算 (Scalable Learning and Distributed Computing)

# 应用与系统 (Applications and Systems)

I/O Complexity of Attention, or How Optimal is FlashAttention?, Barna Saha, Christopher Ye.

# 其他 (Others)

Position: Technical Research and Talent is Needed for Effective AI Governance, Anka Reuel, Lisa Soder, Benjamin Bucknall, Trond Undheim.

# Ref

ICML2024 Awards 列表
ICML2024 Oral 论文列表