Machine Learning

machinelearning@lemmy.ml

PostsComments

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 15 days ago

How ‘Embeddings’ Encode What Words Mean

www.quantamagazine.org

0

9

How ‘Embeddings’ Encode What Words Mean

www.quantamagazine.org

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 15 days ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 1 month ago

New AI model “learns” how to simulate Super Mario Bros. from video footage

arstechnica.com

0

4

New AI model “learns” how to simulate Super Mario Bros. from video footage

arstechnica.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 1 month ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 1 month ago

Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o)

0

12

Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o)

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 1 month ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 1 month ago

It’s Not Intelligent If It Always Halts: A Critical Perspective on Current Approaches to AGI

www.lifeiscomputation.com

6

9

It’s Not Intelligent If It Always Halts: A Critical Perspective on Current Approaches to AGI

www.lifeiscomputation.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 1 month ago

6

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 1 month ago

The Difference Between Speaking and Thinking

www.theatlantic.com

0

6

The Difference Between Speaking and Thinking

www.theatlantic.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 1 month ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 1 month ago

Diffusion Models Are Real-Time Game Engines

gamengen.github.io

0

6

Diffusion Models Are Real-Time Game Engines

gamengen.github.io

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 1 month ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 1 month ago

Liger Kernel is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%.

0

4

Liger Kernel is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%.

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 1 month ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

Transformer Explainer

poloclub.github.io

0

2

Transformer Explainer

poloclub.github.io

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

Alibaba claims no. 1 spot in AI math models with Qwen2-Math

venturebeat.com

0

8

Alibaba claims no. 1 spot in AI math models with Qwen2-Math

venturebeat.com

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

0

yboutros@infosec.pubEnglish · 2 months ago

How to convert a positionally encoded predicted embedding from a decoder to its matching token?

2

4

How to convert a positionally encoded predicted embedding from a decoder to its matching token?

yboutros@infosec.pubEnglish · 2 months ago

2

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

New Open-Source AI Image Generator Beats Midjourney, SD3 and Auraflow

1

10

New Open-Source AI Image Generator Beats Midjourney, SD3 and Auraflow

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

1

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

AI models collapse when trained on recursively generated data

3

27

AI models collapse when trained on recursively generated data

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 2 months ago

3

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months ago

RouteLLM: An Open-Source Framework for Cost-Effective LLM Routing

0

1

RouteLLM: An Open-Source Framework for Cost-Effective LLM Routing

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months ago

Alibaba's Qwen LLM model leading open source rankings

0

6

Alibaba's Qwen LLM model leading open source rankings

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 3 months ago

0

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 4 months ago

By using the same techniques Google used to solve Go (MTCS and backprop), Llama8B gets 96.7% on math benchmark GSM8K. That’s better than GPT-4, Claude and Gemini, with 200x fewer parameters!

2

8

By using the same techniques Google used to solve Go (MTCS and backprop), Llama8B gets 96.7% on math benchmark GSM8K. That’s better than GPT-4, Claude and Gemini, with 200x fewer parameters!

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 4 months ago

2

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 4 months ago

Mixture of Agents (MoA) leverages several open-source LLM agents to achieve a score of 65.1% on AlpacaEval 2.0

www.together.ai

0

6

Mixture of Agents (MoA) leverages several open-source LLM agents to achieve a score of 65.1% on AlpacaEval 2.0

www.together.ai

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 4 months ago

0

ylai@lemmy.mlEnglish · 4 months ago

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate

0

4

From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate

ylai@lemmy.mlEnglish · 4 months ago

0

keepthepace@slrpnk.net · 4 months ago

Torrent tracker for open models

0

8

Torrent tracker for open models

keepthepace@slrpnk.net · 4 months ago

0

wargreymon@sh.itjust.works · 4 months ago

Can gpt generate a gpt model?

5

6

Can gpt generate a gpt model?

wargreymon@sh.itjust.works · 4 months ago

5

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 5 months ago

Sakuga-42M Dataset: Scaling Up Cartoon Research

0

2

Sakuga-42M Dataset: Scaling Up Cartoon Research

☆ Yσɠƚԋσʂ ☆@lemmy.mlEnglish · 5 months ago

0