Polars: The Next Big Python Data Science Library... written in RUST?

166,497
0
Published 2022-12-29
In this video tutorial I explain everything you need to get started coding with polars. Polars is a multi-threaded DataFrame library, meaning that it allows using all the cores of a computer at the same time to achieve its full processing potential. It's been shown to have huge performance gains over pandas.

Timeline:
00:00 Intro
01:00 What is Polars?
02:43 Getting Started
06:32 Filtering
07:15 New Columns
08:10 Groupby
08:55 Combining Dataframes
10:17 Multithreaded Approach
11:21 Speed Test
12:50 Takeaways

Follow me on twitch for live coding streams: www.twitch.tv/medallionstallion_

My other videos:

Speed Up Your Pandas Code:    • Make Your Pandas Code Lightning Fast  
Intro to Pandas video:    • A Gentle Introduction to Pandas Data ...  
Exploratory Data Analysis Video:    • Exploratory Data Analysis with Pandas...  

Working with Audio data in Python:    • Audio Data Processing in Python  
Efficient Pandas Dataframes:    • Speed Up Your Pandas Dataframes  

* Youtube: youtube.com/@robmulla?sub_confirmation=1
* Discord: discord.gg/HZszek7DQc
* Twitch: www.twitch.tv/medallionstallion_
* Twitter: twitter.com/Rob_Mulla
* Kaggle: www.kaggle.com/robikscube

#python #polars #datascience

All Comments (21)
  • @rahuldev2380
    Polars is built on top of Apache Arrow which pandas supports. So you can easily convert your polars dataframe to pandas with almost zero overhead. I use polars to do the hard part and jump back to pandas for the visualization stuff
  • @brd5548
    Our team tried to integrate polars into our analytics pipeline last year, and the result was kinda on and off. To be honest, the performance of pandas is not that bad, we spent some time on doing several fine tunings, like rewriting key bottlenecks with our native modules or with these vectorized pandas methods, and the result turned out just ok. On the other hand, the integration work of polars did require some major revamping and refactoring, due to API gaps and implementation differences between the two. However, the performance gains didn't seem to justify the effort. What's worse, while pandas does come with pitfalls and caveats here and there, polars is a relatively young project and it comes with bugs on basic text manipulating operations. But don't get me wrong, that was my experience last year. I do think polars has the potential. It has a much more robust and modern architecture than pandas in my opinion. Its API style is cleaner and more consistent. And it comes with a query optimization engine, which many users can appreciate if you are familiar with tools like apache spark or some databases. Given time, I think polars should become another powerful player in the future. So, definitely give it a try if you're building something new!
  • @bigphab7205
    10000 points for printing the version. Every tutorial video should do that.
  • 13:20 Regarding learning the syntax… It’s worth mentioning that Polars syntax is very similar to PySpark, so it’s really two birds with one stone.
  • Nice video. Very interesting to see how polar works, hope to see it more frequent in your future streams to learn more about the practical use.
  • Great timing, I was looking to start playing with Polars since Mark Tenenholtz mentioned it some days ago. I went back to Pandas because couldn't find the assign() and astype() equivalents in Polars, I thought they were lacking, but they seem to be with_columns() and cast(). Now I will resume more persistently.
  • @scraps7624
    I saw some tweets about Polars but seeing it in action is something else Also, I can't believe it took me this long to find your channel, subbed!
  • @juan.o.p.
    Thanks for the recommendation, I will definitely give it a try 😊
  • @jcbritobr
    Nice stuff. This Polars seems a killer tool. Thank you for share.
  • @rohitnair4268
    as usual rob nice video i have learned a lot from you
  • @calum.macleod
    Thanks for a good explanation of how Polars could benefit people who use Pandas and need more speed. In my project we already have a heavy emphasis on multi processing and fast inter process communication, so I am especially interested to see a Pandas vs Polar single core performance comparison for group and join. I hope that someone does the comparison and posts it to Youtube.
  • @tmb8807
    I'm blown away by how fast this is. Sure there are some things it can't do, but man, even for just reading large data sets it's absolutely blazing.
  • Thanks for brining this to my attention, I think I might include polars into some productionalization processes. For data exploration, typically I only use parts of dataframes for plotting or investigation. Given that you can convert a polars dataframe to pandas, it seems like a good approach would be to have the the full dataset in polars and then filter into a pandas dataframe and plot.
  • DataTable is also pretty legendary, you might also find it super awesome. Thanks again for your amazing videos, I have watched and learned from every one of them. I hope I'll interview you about your 100k celebration sometime next year 🙏
  • Hi @rub, I think it's a good approach to diversity our tools this days, especially when it comes to deal with memory (sometimes I find myself running out of time with pandas)