Видео 57
Просмотров 830 832

How Codeium Breaks Through the Ceiling for Retrieval: Kevin Hou

18:42

Emergence Launch: AI Agents and the future enterprise: Dr. Satya Nitta

5:41

Low Level Technicals of LLMs: Daniel Han

2:52:26

Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han

17:42

Copilots Everywhere: Thomas Dohmke and Eugene Yan

18:22

Unlocking Developer Productivity across CPU and GPU with MAX: Chris Lattner

18:33

What's new from Anthropic and what's next: Alex Albert

Explore Anthropic's latest strides in large language models, emphasizing enhanced reasoning and multimodal capabilities. We'll showcase how these advancements translate into powerful developer tools, APIs, and best practices for building sophisticated, RSP-aligned AI applications.
Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World's Fair in 2025! Get your tickets today at ai.engineer/2025
About Alex Albert
Alex Albert is the Head of Developer Relations at Anthropic. Prior to his current role, he spent a year as a Prompt Engineer on Anthropic's Product Research team....

Видео

How Codeium Breaks Through the Ceiling for Retrieval: Kevin Hou

18:42

How Codeium Breaks Through the Ceiling for Retrieval: Kevin Hou

Просмотров 7 тыс.9 часов назад

Codeium is trailblazing the next frontier in retrieval and hint: it’s not just embeddings. Learn what the next generation of retrieval looks like and how 1M developers are already leveraging this superpower using the Codeium IDE plugin for AI autocomplete, chat, and search. We’ll dive deep into how existing benchmarks are failing us, what it takes to serve our custom models at scale, and what t...

Emergence Launch: AI Agents and the future enterprise: Dr. Satya Nitta

5:41

Emergence Launch: AI Agents and the future enterprise: Dr. Satya Nitta

Просмотров 1,6 тыс.9 часов назад

AI agents are poised to revolutionize software systems and devices, promising unprecedented automation and efficiency for enterprises. However, the road to this future is riddled with challenges such as inefficiency, non-determinism, high costs, discoverability, and rapid technological evolution. At Emergence, we are tackling these challenges head-on to transform the vision of useful AI agents ...

Low Level Technicals of LLMs: Daniel Han

2:52:26

Low Level Technicals of LLMs: Daniel Han

Просмотров 13 тыс.9 часов назад

This workshop will be split into 3x one hour blocks: How to analyze & fix LLMs - how to find and fix bugs in Gemma, Phi-3, Llama & tokenizers Finetuning with Unsloth - continued pretraining, reward modelling, QLoRA & more Deep dive into LLM technicals - hand deriving derivatives, SOTA finetuning tricks It's recommended you have Python with Pytorch and Unsloth installed (or use online Google Col...

Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han

17:42

Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han

Просмотров 1,5 тыс.9 часов назад

The story behind our 8 bug fixes for Gemma, multiple tokenization fixes for Llama 3, a sliding window bug fix and Mistral-fying Phi-3, and learn about how we analyse and find and fix bugs in open source models. Learn also how we make finetuning 2x faster for all these models Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at www.ai.engineer/worldsf...

Copilots Everywhere: Thomas Dohmke and Eugene Yan

18:22

Copilots Everywhere: Thomas Dohmke and Eugene Yan

Просмотров 1 тыс.21 час назад

Join GitHub CEO Thomas Dohmke for the closing keynote with a deep dive on Copilot Workspace, and what’s ahead as he talks on AI’s coming agentic wave. Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World's Fair in 2025! Get your tickets today at ai.engineer/2025 About Thomas ...

Unlocking Developer Productivity across CPU and GPU with MAX: Chris Lattner

18:33

Unlocking Developer Productivity across CPU and GPU with MAX: Chris Lattner

Просмотров 3,2 тыс.День назад

Today's leading generative AI applications have workloads that span high performance GPU compute, CPU preprocessing, data-loading, and orchestration - often spread across a combination of Python, C /Rust, and CUDA C - which increases the complexity and slows down the cycle of innovation. This talk explores the capabilities and power of the Modular Mojo programming language and Modular Accelerat...

From Software Developer to AI Engineer: Antje Barth

19:48

From Software Developer to AI Engineer: Antje Barth

Просмотров 3,4 тыс.День назад

In this keynote, Antje explores how generative AI is transforming the landscape of software development, enabling developers to innovate like never before. She will showcase the latest advancements in AI and demonstrate the powerful capabilities of generative AI tools available on AWS. You will learn how to harness these tools to accelerate your development processes, enhance creativity, and bu...

35:21

Lessons From A Year Building With LLMs

Просмотров 11 тыс.14 дней назад

Special double-feature closing keynote from the 6 authors of the hit O'Reilly article on Applied LLMs. Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World's Fair in 2025! Get your tickets today at ai.engineer/2025 About Eugene Yan I build ML systems to serve customers at sca...

Open Challenges for AI Engineering: Simon Willison

18:49

Open Challenges for AI Engineering: Simon Willison

Просмотров 5 тыс.14 дней назад

About Simon Simon Willison is the creator of Datasette, an open source tool for exploring and publishing data. He currently works full-time building open source tools for data journalism, built around Datasette and SQLite. Prior to becoming an independent open source developer, Simon was an engineering director at Eventbrite. Simon joined Eventbrite through their acquisition of Lanyrd, a Y Comb...

Llamafile: bringing AI to the masses with fast CPU inference: Stephen Hood and Justine Tunney

17:25

Llamafile: bringing AI to the masses with fast CPU inference: Stephen Hood and Justine Tunney

Просмотров 34 тыс.14 дней назад

Mozilla's Llamafile open source project democratizes access to AI not only by making open models easier to use, but also by making them run fast on consumer CPUs. Lead developer Justine Tunney will share the insights, tricks, and hacks that she and the project community are using to deliver these performance breakthroughs, and project leader Stephen Hood will discuss Mozilla's approach to suppo...

The Future of Knowledge Assistants: Jerry Liu

16:55

The Future of Knowledge Assistants: Jerry Liu

Просмотров 55 тыс.21 день назад

In this talk, LlamaIndex founder & CEO Jerry Liu covers how we go beyond single-LLM prompt calls. He discusses advanced single-agent flows, Agentic RAG, multi-agent task-solvers & service architectures, and more. Jerry also announces Llama Agents: Agents as microservices that are easily deployed and communicate via a single API (and much more). Recorded live in San Francisco at the AI Engineer ...

The Making of Devin by Cognition AI: Scott Wu

20:04

The Making of Devin by Cognition AI: Scott Wu

Просмотров 6 тыс.21 день назад

Meet Devin, a state-of-the-art AI software agent that helps developers save time and achieve more. Scott Wu, co-founder and CEO of Cognition AI, demos its capabilities and shares some of the lessons that he and his team have learned while building Devin. Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at www.ai.engineer/worldsfair/2024/schedule & j...

From Text to Vision to Voice Exploring Multimodality with Open AI: Romain Huet

23:39

From Text to Vision to Voice Exploring Multimodality with Open AI: Romain Huet

Просмотров 7 тыс.21 день назад

The future we are building towards: featuring a demo of GPT4o Omnimodel Voice, ChatGPT Desktop, Sora, and Voice Engine all in one talk. Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World's Fair in 2025! Get your tickets today at ai.engineer/2025 About Romain Hello! I’m a so...

The Code AI Maturity Model and What It Means For You: Ado Kukic

7:58

The Code AI Maturity Model and What It Means For You: Ado Kukic

Просмотров 1,5 тыс.5 месяцев назад

The Code AI Maturity Model and What It Means For You: Ado Kukic

How to Become an AI Engineer from a Fullstack Background - Reid Mayo

10:19

How to Become an AI Engineer from a Fullstack Background - Reid Mayo

Просмотров 6 тыс.6 месяцев назад

How to Become an AI Engineer from a Fullstack Background - Reid Mayo

Using AI to Build an Infinite Game: Jeff Schomay

11:08

Using AI to Build an Infinite Game: Jeff Schomay

Просмотров 7696 месяцев назад

Using AI to Build an Infinite Game: Jeff Schomay

GPT Web App Generator - 10,000 apps created in a month: Matija Sosic

9:23

GPT Web App Generator - 10,000 apps created in a month: Matija Sosic

Просмотров 1,7 тыс.6 месяцев назад

GPT Web App Generator - 10,000 apps created in a month: Matija Sosic

Storyteller: Building Multi-modal Apps with TS & ModelFusion - Lars Grammel, PhD

7:30

Storyteller: Building Multi-modal Apps with TS & ModelFusion - Lars Grammel, PhD

Просмотров 9896 месяцев назад

Storyteller: Building Multi-modal Apps with TS & ModelFusion - Lars Grammel, PhD

Open Questions for AI Engineering: Simon Willison

24:33

Open Questions for AI Engineering: Simon Willison

Просмотров 4,5 тыс.8 месяцев назад

Open Questions for AI Engineering: Simon Willison

19:41

Trust, but Verify: Shreya Rajpal

Просмотров 3,6 тыс.8 месяцев назад

Trust, but Verify: Shreya Rajpal

Harnessing the Power of LLMs Locally: Mithun Hunsur

17:09

Harnessing the Power of LLMs Locally: Mithun Hunsur

Просмотров 2,3 тыс.8 месяцев назад

Harnessing the Power of LLMs Locally: Mithun Hunsur

The Weekend AI Engineer: Hassan El Mghari

21:49

The Weekend AI Engineer: Hassan El Mghari

Просмотров 2,4 тыс.8 месяцев назад

The Weekend AI Engineer: Hassan El Mghari

120k players in a week: Lessons from the first viral CLIP app: Joseph Nelson

15:59

120k players in a week: Lessons from the first viral CLIP app: Joseph Nelson

Просмотров 1,2 тыс.8 месяцев назад

120k players in a week: Lessons from the first viral CLIP app: Joseph Nelson

Building Production-Ready RAG Applications: Jerry Liu

18:35

Building Production-Ready RAG Applications: Jerry Liu

Просмотров 296 тыс.8 месяцев назад

Building Production-Ready RAG Applications: Jerry Liu

Retrieval Augmented Generation in the Wild: Anton Troynikov

12:20

Retrieval Augmented Generation in the Wild: Anton Troynikov

Просмотров 3,1 тыс.8 месяцев назад

Retrieval Augmented Generation in the Wild: Anton Troynikov

Domain adaptation and fine-tuning for domain-specific LLMs: Abi Aryan

25:09

Domain adaptation and fine-tuning for domain-specific LLMs: Abi Aryan

Просмотров 5 тыс.8 месяцев назад

Domain adaptation and fine-tuning for domain-specific LLMs: Abi Aryan

Pragmatic AI with TypeChat: Daniel Rosenwasser

18:34

Pragmatic AI with TypeChat: Daniel Rosenwasser

Просмотров 1,9 тыс.8 месяцев назад

Pragmatic AI with TypeChat: Daniel Rosenwasser

17:02

Building Reactive AI Apps: Matt Welsh

Просмотров 4,9 тыс.8 месяцев назад

Building Reactive AI Apps: Matt Welsh

56:57

AI Engineering 201: The Rest of the Owl

Просмотров 2,8 тыс.8 месяцев назад

AI Engineering 201: The Rest of the Owl

@ayushshukla3538 40 минут назад
"hey" a bezos fan , I liked bezos electricity analogy used here , he used it for aws on y combinator startup school and he used it for LLM
@micbab-vg2mu 40 минут назад
The presentation is great - ) I hope soon we will have new models from Anthropic:)
@nimeshgurung6600 59 минут назад
Man absolutely love Claude. Anthropic is absolutely amazing in terms of the finished product, completely stopped using ChatGPT almost by habit.
@brunomattesco 5 часов назад
this micro agents structure was exactly what i was thinking yesterday and want to sell a saas about it
@Jason_RA 12 часов назад
This is absolutely amazing!
@JOHNSMITH-ve3rq День назад
Dudes squeaky style of speech really annoying. audience guy with totally dumb questions also annoying. 30 minutes in. Didn’t learn anything. Does it get better ??
@constantinegeist1854 День назад
All of this was already possible before... Already back in early 2023. What they did was just save you 15 minutes (otherwise you'd have to download an inference program and weights separately)
@CaptainSpoonsAlot День назад
this is just fantastic.
@superfliping 2 дня назад
As soon as you say you're backed by a Fortune 500 company, and they like your coding and data. probably means you're a sellout to that companies too and you're trying to take human data and sell it to the highest-rated players. Why should we believe you? Do you run crowdstrike as secure communication systems too?
@Bakmandour 2 дня назад
If we see Agents as Microservices, why not reusing existing Microservices infrastructures proved reliable from years now? Truly curious about the reasons.
@Bakmandour 2 дня назад
@Jerry Liu
@zacboyles1396 17 часов назад
You absolutely should be, I’m of the opinion that’s where the biggest gains are being made. Micro agents can enhance old exception handling processes with specialized agents redirecting requests while factoring in live system information or contextual data. In general it allows your old micro services to handle more complex tasks or accept a wider variety of inputs. Think about all the processes with some type of minimum criteria requirement which failed requests get passed to more expensive, often manual, or human involved workflows. A cheap micro agents can fill in missing details or approve alternative workflows. To say it’s a polishing for micro services is an understatement, it’s more like a powered exoskeleton with Jarvis to keep them company. 😂
@swyxTV 2 дня назад
Thanks for speaking Daniel!!
@realisticlevel2553 2 дня назад
the goat
@666WolfWere 3 дня назад
layer norml helps to avoid gradient banishing or explotion. Before that, it was almost impossible to train a deep network.
@danielhanchen 2 дня назад
Oh yes! Vanishing and exploding gradients! I remember people first said batch norm was used to reduce "internal covariate shift", but I more ascribe to the smoother and easier optimization reasoning for layernorms
@robertputneydrake 3 дня назад
Is this Roger from American Dad?
@ShaunShibu-oz8yn 3 дня назад
How can you ever know if these LLM providers don’t intentionally mix leaderboard data in smart ways to game the ranking.
@danielhanchen 2 дня назад
Yep a huge problem - I normally trust the Hard Prompts section in the Chat LMSYS Leaderboard, and just rely on Redditors liking or disliking models - sadly we don't know for sure if models include the outputs of the Chatbot Arena dataset - some models at least explicitly state they train on the inputs / instructions of conversations.
@shafikhan2 3 дня назад
@KevinHou22 - We are looking for AI experts interested in part-time collaboration. If you're interested in contributing to cutting-edge AI projects for a startup, let's connect!
@oguzhanyldrm962 3 дня назад
🎯 Key points for quick navigation: 00:35 *📊 Overview of Workshop and Introductions* - Introduction to low-level technical analysis of language models by Daniel Han. - Discussing the purpose of finding and fixing bugs in various language model implementations. 01:56 *🐛 Finding Bugs in Gemma and Other Models* - Explanation of initial bug discoveries in Gemma, including issues with approximate vs exact calculations. - Highlighting the complexity and variation in different model implementations. 04:03 *🧠 Analyzing Architecture and Quirks of Large Models* - Discussing architectural quirks in large models like Nvidia's 340 billion parameter model. - Exploration of non-standard implementations such as squared activations. 05:16 *🧩 Challenges in Tokenization* - Addressing tokenization challenges and discrepancies across models. - Example issues with token variants causing different results across implementations. 06:25 *📊 Broadening Discussion Beyond Language Models* - Introduction to broader technical knowledge including SVD, PCA, and machine learning fundamentals. - Encouragement for exploration and understanding of foundational algorithms. 10:09 *💻 UNS Slof and Optimization Techniques* - Introduction to UNS Slof, optimizing fine-tuning of language models for efficiency. - Discussion on Triton kernels and CUDA programming for GPU optimization. 13:05 *⚙️ Understanding Sparsity in GPUs* - Explanation of sparsity feature in GPUs and its impact on training speed. - Clarifying benefits and challenges of enabling sparsity in language model training. 15:09 *📈 Learning Rate Schedules and Model Training* - Discussion on the impact of learning rate schedules and epochs on model training. - Evaluation of different methodologies and their influence on model performance. 18:11 *🏆 Impact of Bug Fixes on Model Performance* - Audience anecdote on the unexpected performance improvement post-bug fix in Gemma. - Speculation on the multifaceted nature of bug fixes and their cumulative effect. 20:00 *🛠️ Overview of GPU memory management and efficiency in model training* - Understanding how offloading GPU memory to system RAM can affect execution speed. - Importance of correct memory offloading to avoid performance degradation. 21:13 *🧠 Introduction to Transformer architecture and its applications* - Explanation of the Transformer's role in language models like GPT-4, GPT-3, and others. - Versatility of Transformers beyond language modeling for sequence modeling tasks. 32:34 *📊 Tokenization strategies: from simple to industry standard* - Creation and shortcomings of a basic tokenization method with combined punctuation. - Issues identified such as vocabulary inflation and lack of normalization. 43:17 *🧠 Understanding Sequence Modeling* - Sequence modeling involves predicting subsequent words iteratively. - Never use future data in training machine learning models. 44:16 *🎲 Importance of Tokenization in Language Models* - Tokenization requires each component to have the same number of numerical tokens. - The number of combinations in token assignment can be infinite in theory. 46:06 *🛠️ Initialization and Training Considerations* - Random initialization of model parameters can lead to issues like exploding gradients. - Proper initialization is crucial to prevent training instability. 47:34 *📊 Structure of Training Data for Language Models* - Training data for language models consists of sequences of tokenized text. - Each sequence of tokens can be represented as a table of numerical embeddings. 49:15 *🔄 Training Mechanism and Transformer Architecture* - Language models predict the next word in a sequence using shifted token predictions. - The Transformer architecture includes attention and MLP layers for prediction refinement. 50:13 *🌐 Components of Language Models* - Language models consist of prediction and MLP (Multi-Layer Perceptron) components. - The attention mechanism in Transformers enhances sequence modeling capabilities. 51:08 *🤔 Exploring Multi-Token Prediction in Transformers* - Transformers can predict multiple tokens at once by adjusting training objectives. - Multi-token prediction can expedite inference time in language models. 52:51 *🔑 Tokenization and Embedding Process* - Tokenizers convert text tokens into numerical IDs for embedding lookup. - Embedding dimensions determine the vector representation's complexity for each token. 54:24 *🚀 Enhancing Training Efficiency with Padding and Tokenization* - Padding tokens to a specific length can optimize GPU caching and training speed. - Tokenizers with padded vocabularies enhance data processing efficiency. 56:06 *⚠️ Handling Tokenization Errors and Untrained Tokens* - Tokenization errors can occur when using untrained tokens in fine-tuning. - Setting untrained tokens to mean embeddings mitigates model training issues. 59:25 *🌐 Complexity Reduction in Language Model Training* - Language models utilize attention mechanisms to reduce computational complexity. - Masked attention allows language models to skip predicting future tokens. 01:00:49 *🧩 Mechanisms of Attention and Masking in Transformers* - Attention mechanisms in Transformers utilize masking to skip irrelevant token interactions. - Softmax normalization in attention mechanisms ensures probabilistic token predictions. 01:06:26 *🧮 Softmax and Layer Norms* - Layer Norms normalize inputs across features, stabilizing training and improving model performance. 01:09:12 *📊 Backpropagation Challenges* - Differentiating Layer Norms during backpropagation involves complex matrix operations. - Triton's implementation complexities lie in managing gradients effectively for Layer Norms. 01:13:24 *🧩 Positional Encodings: Rope Embeddings* - Rope embeddings enhance Transformer accuracy by encoding positional information dynamically. - Absolute positional encodings are simpler but less effective compared to dynamic methods like rope embeddings. 01:21:01 *🔄 Derivatives and MLP in Transformers* - Deriving gradients for rope embeddings involves specialized matrix operations like rotation matrices. - MLP components in Transformers mix signals to enhance model expressiveness and learning flexibility. 01:27:54 *🧠 Understanding Matrix Operations in LLMs* - Matrix operations like W_up, W_gate, and W_down are crucial in attention mechanisms. - These matrices are trained to enhance model capacity and projection efficiency. 01:29:42 *📊 Managing Derivatives and Mathematical Formulas* - Deriving formulas manually for complex functions like softmax derivatives is challenging and time-consuming. - Tools like Desmos aid in visualizing and verifying mathematical derivations. 01:32:06 *🛠️ Enhancing Stability and Performance with Chunking* - Chunking techniques optimize GPU memory usage for large vocabulary sizes in models like Llama. - Techniques such as subtracting the maximum value in softmax enhance stability during training. 01:35:17 *🔍 Exploring Implementation Details of Llama Architecture* - Detailed examination of key components like Layer Norms (LNorm) and rotary embeddings in Llama models. - Insight into specific code segments for Layer Norm kernels and architectural optimizations. 01:50:03 *🧠 Low-level technical details of LLMs:* - Understanding the architecture of LLMs involves multiple layers and operations, culminating in generating logits for token prediction. - Upcasting to float32 from float16 enhances training stability by preventing NaNs due to large exponentials in softmax calculations. 01:57:40 *🔍 Analyzing JMA bugs:* - Detailed exploration of bugs in JMA models reveals issues such as missing BOS tokens and typographical errors in papers. 02:01:22 *🔄 Decisions in model implementation:* - Choosing between different model fixes (like the blue versus black line) involves balancing multiple errors and aligning with original implementations. 02:12:52 *🧮 Floating Point Formats and Performance Comparison* - Overview of floating point formats (float16, float32) and their transistor requirements. 02:15:13 *🚀 Future of GPU Precision: Float16 and Beyond* - Discussion on the potential future of GPU precision beyond float16. 02:22:12 *🔍 Analyzing Precision Issues in Machine Learning Models* - Issues and considerations when implementing different precision formats in machine learning models. 02:28:11 *🛠️ Debugging Challenges in Precision Implementation* - Challenges and methodologies for debugging precision-related issues in ML frameworks. 02:34:16 *🐍 Analyzing Implementation Differences* - Comparing implementations between Hugging Face, PyTorch, and J implementations. 02:36:26 *🐞 Issues with Sliding Window Implementation* - Discussing the sliding window bug in LLMs, specifically with a token limit of 2047 instead of 2048. 02:40:25 *🛠️ Tokenization Challenges and Solutions* - Addressing challenges in tokenizer configurations and functionality.
@danielhanchen 3 дня назад
Thank you for inviting me! Let me know what you guys would like me to talk about next time! 😊 Also if you guys want me to clarify something, I'll try my best to reply! 🙏
@kumesh2785 День назад
Great talk! Can you share insights on how to read other’s codebases effectively, now that you have achieved great pace. Thanks!
@Player-oz2nk 3 дня назад
Cursor plus codeium is BEAST
@maxwelljiang4729 3 дня назад
whoa :0 what's your setup here? curious what cursor provides that codeium's missing, and vice versa
@KevinHou22 4 дня назад
Thank you for having me! Always great to share what we’ve learned with the world.
@StephenRayner День назад
Just stopped hacking away on a Saturday on a coding assistant I am building to cut the lawn. So glad I clicked on this video. The talk was excellent, really enjoyed the insight into how you are your team are tackling the context issue.
@KevinHou22 11 часов назад
@@StephenRayner Sounds like a great Saturday. Thanks for watching! Glad you were able to learn something new
@taptnsovereigns4024 4 дня назад
Clutch
@Troll3rHD 4 дня назад
this guy is a war machine
@danielhanchen 2 дня назад
Oop thanks!
@JustSuds 4 дня назад
I am dying for access to github copilot workspaces
@manncodes 4 дня назад
that one guy on the right is annoying!!
@Jay-wx6jt 4 дня назад
This guy is a gem. Keep it up
@danielhanchen 2 дня назад
:)
@muhannadobeidat 4 дня назад
Thanks for the tech and for AIE for making the video available with such great production quality. One thing I missed is what do you train on? Especially that generated code will have ownership and IP concerns/issues, is this custom trained on organization private codebase?
@rohanp26 3 дня назад
They don't train on GPL or non-permissive code. You can use Codeium in your organization (aka Codeium doesn't claim IP over generated code). Source: Codeium FAQ
@TimothyJoh 4 дня назад
Brilliant talk. As an engineering leader this approach is way more powerful than the state of the art from the larger companies. Appreciate the deep dive and I can see all the deep thinking going into your product.
@KevinHou22 4 дня назад
Much appreciated 🙂 glad you enjoyed the talk
@UserUser-he5zm 4 дня назад
Justine Tunney was caught plagiarizing code from a user on llama.cpp and banned by its creator, Greg Gerganov
@codeiumdev 4 дня назад
Shoutout to the AI Engineer World's Fair for having us and Kevin Hou! We're constantly improving our code retrieval system and working on tough problems at the cutting edge of AI. We're hiring across all roles!
@lemonsqueeezey 4 дня назад
Watching so far and like how engaging the workshop is! We need a workshop for Triton or CUDA entirely.
@danielhanchen 3 дня назад
Could definitely be an interesting topic! 👀
@user-xk3tj5cj8p 3 дня назад
@@danielhanchen yes triton please 🎉🎉
@blockchainstreet 4 дня назад
Good one!!
@memehub2002 4 дня назад
first
@gu5on16 5 дней назад
thank you
@easterngap2912 5 дней назад
Devin, trained on the collective mistakes of two generations of stackoverflow users. Imagine if the control systems on the Nuclear Submarine Parked 100km off shore were written by Devin. Or more likely an ecomm website. Imagine what happens when Devin has a transitory hallucination, my clients celebrate negative prices and I go broke in minutes. The best use case is Devin creating the code that is usde as examples for teaching debugging.
@ajitabhkumar5449 5 дней назад
Amazing 👏👏👏. Just tried it on ubuntu with arm (aarch64 ). Just cpu. works great. Also, RAM usage is quite low, which is surprising.
@jianghong6444 6 дней назад
at 8:16 the presenter is comparing MAX against llama.cpp using CPU as inference, now the main contributor of llamafile claims that llama.cpp mainly focus on GPU stack (which sort of makes sense since CPU can be comparatively slower), so I'm not sure how big of a impact that would be.
@omercelebi2012 6 дней назад
What about quality trade-off? Did they mention about that?
@ImSaran 7 дней назад
The Real AI
@john_blues 7 дней назад
Is there a way to get Windows to run llamafiles bigger than 4Gb? Without being able to do that, it is very limiting in the models you can run.
@afish5581 7 дней назад
Awesome presentation. That was so well done 👍🏻
@bhagwandassoni3737 8 дней назад
जावा में।
@GandalfTheBrown117 8 дней назад
Justine is a GOAT
@GandalfTheBrown117 8 дней назад
Tired -> wired around @9:30 😂
@Viewable11 8 дней назад
Llamafile now supports OpenAI API and non-AVX CPUs. Finally! Having the OpenAI API is a must.
@christopherprobst-ranly6357 9 дней назад
Outside of Python AI bubble this is so old and natural that you would never call it an Invention 😂 Well that happens when some data scientists try to host their Jupyter Notebook 😂
@bobtarmac1828 9 дней назад
Free candy, I mean, Free open source Ai for everyone. It’s a like a trick. Don’t fall for it. Cease Ai.
@mso2802 10 дней назад
what music is that by the way?
@JL-1735 10 дней назад
I have zero interest in Modular neither in MAX as long as it’s not fully open source. They have the right to make it closed, but the “we are making some things open” without any clarity or guarantee that the rest of the stack will eventually become open, is equal to it just being closed source. I would consider it a rug pull, as Chris has been teasing the community and been earning positive press -as if- it’s an open source project.
@LisaSamaritan 3 дня назад
He explains it on Lex Fridmans podcast #381. You can jump to 02:21:57. But basically he had bad experience from making Swift, where everyone wanted new functionality at he same time as the core parts was being developed and that led to a bunch of bugs and rewrites, and he don't want to make that mistake again. He will release parts as the become stable enough, that this will not happen.
@LisaSamaritan 3 дня назад
Besides all of his other projects* is open source, so why do you think he wouldn't do it again? * The LLVM/MLIR compiler The Clang compiler The Swift programming language The biggest question was surrounding MAX. MAX is written in Mojo, but isn't a part of the language. It now has a free license, for lokal/on prem use. You have to pay for using it in the cloud and for commercial support. [Also, nothing prevents you from writing your own MAX like solution in Mojo... Modular have to make money somehow and the license seems fair. Most people get it for free and the ones that can afford to pay, will pay.] But even without MAX, you will have Mojo, that is as simple to use as Python and can run any Python program at an expected 2-10x speed improvement (compared to Python's own compiler, without any optimization). A 10-100x improvement if you use the Mojo specific, low level parts (basically like writing a part in RUST). And in rare occasions you can get a greater improvement. There is some algorithm that have shown 36000x extra speed (if I remember correctly). As with everything, whatever extra speed you will get depends on many factors.
@RickySupriyadi 10 дней назад
OMG if there is new standard of API can communicate with LLM OMG that really change the world if they all use this standard automation in simple step! uh not really what about security.... like rouge LLM roaming around and exploiting those API wow more talks like these please. oh if it's open source LLM can communicate with those API might get more secured? maybe

AI Engineer

Комментарии