The Importance of Positional Encoding Initialization in Transformers for Relational Reasoning

T Ito, L Cocchi, T Klinger, P Ram, M Campbell… - arXiv preprint arXiv …, 2024 - arxiv.org
Relational reasoning refers to the ability to infer and understand the relations between
multiple entities. In humans, this ability underpins many higher cognitive functions, such as …

A dual-receptor model of serotonergic psychedelics: therapeutic insights from simulated cortical dynamics

A Juliani, V Chelu, L Graesser, A Safron - bioRxiv, 2024 - biorxiv.org
Serotonergic psychedelics have been identified as promising next-generation therapeutic
agents in the treatment of mood and anxiety disorders. While their efficacy has been …

Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance

C Kaushik, R Liu, CH Lin, A Khera, MY Jin… - arXiv preprint arXiv …, 2024 - arxiv.org
Classification models are expected to perform equally well for different classes, yet in practice,
there are often large gaps in their performance. This issue of class bias is widely studied …

Representation Learning Using a Single Forward Pass

A Somasundaram, P Mishra, A Borthakur - arXiv preprint arXiv …, 2024 - arxiv.org
We propose a neuroscience-inspired Solo Pass Embedded Learning Algorithm (SPELA).
SPELA is a prime candidate for training and inference applications in Edge AI devices. At the …

AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models

Z Zeng, Y Miao, H Gao, H Zhang, Z Deng - arXiv preprint arXiv …, 2024 - arxiv.org
Mixture of experts (MoE) has become the standard for constructing production-level large
language models (LLMs) due to its promise to boost model capacity without causing significant …

Benchmarking Predictive Coding Networks--Made Simple

L Pinchetti, C Qi, O Lokshyn, G Olivers, C Emde… - arXiv preprint arXiv …, 2024 - arxiv.org
In this work, we tackle the problems of efficiency and scalability for predictive coding networks
in machine learning. To do so, we first propose a library called PCX, whose focus lies on …

Calibrating Reasoning in Language Models with Internal Consistency

Z Xie, J Guo, T Yu, S Li - arXiv preprint arXiv:2405.18711, 2024 - arxiv.org
Large language models (LLMs) have demonstrated impressive capabilities in various reasoning
tasks, aided by techniques like chain-of-thought (CoT) prompting that elicits verbalized …

S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs

W Zhong, M Bharadwaj - arXiv preprint arXiv:2405.20314, 2024 - arxiv.org
Speculative decoding (SD) has attracted a significant amount of research attention due to the
substantial speedup it can achieve for LLM inference. However, despite the high speedups …

Block Transformer: Global-to-Local Language Modeling for Fast Inference

N Ho, S Bae, T Kim, H Jo, Y Kim, T Schuster… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper presents the Block Transformer architecture which adopts hierarchical global-to-local
modeling to autoregressive transformers to mitigate the inference bottlenecks of self-…

Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

X Ma, G Fang, MB Mi, X Wang - arXiv preprint arXiv:2406.01733, 2024 - arxiv.org
Diffusion Transformers have recently demonstrated unprecedented generative capabilities
for various tasks. The encouraging results, however, come with the cost of slow inference, …