Google Scholar

The Importance of Positional Encoding Initialization in Transformers for Relational Reasoning

T Ito, L Cocchi, T Klinger, P Ram, M Campbell… - arXiv preprint arXiv …, 2024 - arxiv.org

Relational reasoning refers to the ability to infer and understand the relations between
multiple entities. In humans, this ability underpins many higher cognitive functions, such as …

[PDF] biorxiv.org

A dual-receptor model of serotonergic psychedelics: therapeutic insights from simulated cortical dynamics

A Juliani, V Chelu, L Graesser, A Safron - bioRxiv, 2024 - biorxiv.org

Serotonergic psychedelics have been identified as promising next-generation therapeutic
agents in the treatment of mood and anxiety disorders. While their efficacy has been …

[PDF] arxiv.org

Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance

C Kaushik, R Liu, CH Lin, A Khera, MY Jin… - arXiv preprint arXiv …, 2024 - arxiv.org

Classification models are expected to perform equally well for different classes, yet in practice,
there are often large gaps in their performance. This issue of class bias is widely studied …

[PDF] arxiv.org

Representation Learning Using a Single Forward Pass

A Somasundaram, P Mishra, A Borthakur - arXiv preprint arXiv …, 2024 - arxiv.org

We propose a neuroscience-inspired Solo Pass Embedded Learning Algorithm (SPELA).
SPELA is a prime candidate for training and inference applications in Edge AI devices. At the …

[PDF] arxiv.org

AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models

Z Zeng, Y Miao, H Gao, H Zhang, Z Deng - arXiv preprint arXiv …, 2024 - arxiv.org

Mixture of experts (MoE) has become the standard for constructing production-level large
language models (LLMs) due to its promise to boost model capacity without causing significant …

保存ほぞん引用いんよう関連かんれん記事きじ HTMLバージョン

[PDF] arxiv.org

Benchmarking Predictive Coding Networks--Made Simple

L Pinchetti, C Qi, O Lokshyn, G Olivers, C Emde… - arXiv preprint arXiv …, 2024 - arxiv.org

In this work, we tackle the problems of efficiency and scalability for predictive coding networks
in machine learning. To do so, we first propose a library called PCX, whose focus lies on …

[PDF] arxiv.org

Calibrating Reasoning in Language Models with Internal Consistency

Z Xie, J Guo, T Yu, S Li - arXiv preprint arXiv:2405.18711, 2024 - arxiv.org

Large language models (LLMs) have demonstrated impressive capabilities in various reasoning
tasks, aided by techniques like chain-of-thought (CoT) prompting that elicits verbalized …

[PDF] arxiv.org

S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs

W Zhong, M Bharadwaj - arXiv preprint arXiv:2405.20314, 2024 - arxiv.org

Speculative decoding (SD) has attracted a significant amount of research attention due to the
substantial speedup it can achieve for LLM inference. However, despite the high speedups …

保存ほぞん引用いんよう被ひ引用いんよう数すう: 1 関連かんれん記事きじ全ぜん 2 バージョン HTMLバージョン

[PDF] arxiv.org

Block Transformer: Global-to-Local Language Modeling for Fast Inference

N Ho, S Bae, T Kim, H Jo, Y Kim, T Schuster… - arXiv preprint arXiv …, 2024 - arxiv.org

This paper presents the Block Transformer architecture which adopts hierarchical global-to-local
modeling to autoregressive transformers to mitigate the inference bottlenecks of self-…

[PDF] arxiv.org

Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

X Ma, G Fang, MB Mi, X Wang - arXiv preprint arXiv:2406.01733, 2024 - arxiv.org

Diffusion Transformers have recently demonstrated unprecedented generative capabilities
for various tasks. The encouraging results, however, come with the cost of slow inference, …

アラートを作成さくせい

引用いんよう

検索けんさくオプション

マイライブラリに保存ほぞんしました

The Importance of Positional Encoding Initialization in Transformers for Relational Reasoning

A dual-receptor model of serotonergic psychedelics: therapeutic insights from simulated cortical dynamics

Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance

Representation Learning Using a Single Forward Pass

AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models

Benchmarking Predictive Coding Networks--Made Simple

Calibrating Reasoning in Language Models with Internal Consistency

S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs

Block Transformer: Global-to-Local Language Modeling for Fast Inference

Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching