Pushing This AI Beast To Its Limit (2x RTX 6000)🔥

32,028
0
Published 2024-06-20
Learn more about Dell Precision AI-ready workstations - dell.to/3zNGNw7

Thanks to Dell and NVIDIA ❤️

Be sure to check out Pinecone for all your Vector DB needs: www.pinecone.io/

Join My Newsletter for Regular AI Updates 👇🏼
www.matthewberman.com/

Need AI Consulting? 📈
forwardfuture.ai/

My Links 🔗
👉🏻 Subscribe:    / @matthew_berman  
👉🏻 Twitter: twitter.com/matthewberman
👉🏻 Discord: discord.gg/xxysSXBxFW
👉🏻 Patreon: patreon.com/MatthewBerman
👉🏻 Instagram: www.instagram.com/matthewberman_ai
👉🏻 Threads: www.threads.net/@matthewberman_ai
👉🏻 LinkedIn: www.linkedin.com/company/forward-future-ai

Media/Sponsorship Inquiries ✅
bit.ly/44TC45V

Disclosures:
I am an in

All Comments (21)
  • @trader548
    Me watching this in 10 years time. Dude I got more power in my smart watch.
  • @adamstewarton
    30k to run a 70b model at under 10t/s that's a big no from me 😂
  • You should use this monster model to set up a series of agents interacting together, each with different specifications. Running in parallel. Like running a village of NPC. Or doing some really complex tasks that require a lot of small tasks. And so on
  • @Interloper12
    Months from now, LLM technology improves to the point where we can run 1000 models in parallel. Consumer machines have 100 GPUs packed inside. Inference is lightning fast. Tokens per second is staggering. And just when you think you have all the world's power at your fingertips, each model still thinks that marbles somehow stick to cups when you place them upside down inside microwaves.
  • @hipotures
    My electricity bills have increased by 800% since I started using ML/LLM models locally :(
  • @descmba
    One area that I am trying to find a better solution for is to have LLM parse PDF documents. Specifically technical documents that have chapters, instructions, and images with captions. It would be interesting to see how the larger models can be used to train smaller ones on this task.
  • @ashtwenty12
    You better be running a monster agent system now with this rig. Please ask it some super complicated maybe write a novel
  • @stonedoubt
    I built my own for a little over $10k. Threadripper 7960x, 128gb DDR5 and 3 MSI RTX 4090 Liquid Suprim X in a massive Lian Li V3000 case.
  • @MikeEpler
    How about some multi model agent workflow testing
  • @chadnice
    12:34 Sampling steps are set to 30. Go to the settings tab and try the quality option. This should change it to 60 steps. There is also an up scaling option. You can try it with an image you generated by dragging it to the bottom. Test the 2x or whatever is highest. If you really want to push it, keep upscaling the same image.
  • I'm not SHOCKED. I am running 5x P40 (120 GB VRAM) since 1 year. I spent <1000 Euro. No regret.
  • @TheMattgwapo
    There is a way to load a model into vram once and then have multiple requests using it at the same time. Your demo is loading the same model into memory over and nover.
  • @drewski6843
    Ahhh I want one! lol! Lucky, you're able to test it out. Congrats!
  • @afrikai
    Congratulations! Is this the computer you used for your worm simulation?
  • @360_SA
    Can use Vram as one total 96gb or 2 individual 48gb
  • OMFG - You're face is melting with happyness! Good for you Matt and thanks for all the great videos! kudos!