PyCon 2024 showcased a number of ways to speed the pokey Python programming language including sub-interpreters, immortal objects, just-in-time compilation and more.
When you need speed in Python, after profiling, checking for errors, and making damn sure you actually need it, you code the slow bit in C and call it.
When you need speed in C, after profiling, checking for errors, and making damn sure you actually need it, you code the slow bit in Assembly and call it.
When you need speed in Assembly, after profiling, checking for errors, and making damn sure you actually need it, you’re screwed.
Which is not to say faster Python is unwelcome, just that IMO its focus is frameworking, prototyping or bashing out quick and perhaps dirty things that work, and that’s a damn good thing.
Generally I bash together the one-off programs in Python and if I discover that my “one off” program is actually being run 4 times a week, that’s when I look at switching to a compiled language.
Case in point: I threw together a python program that followed a trajectory in a point cloud and erased a box around the trajectory. Found a python point cloud library, swore at my code (and the library code) for a few hours, tidied up a few point clouds with it, job done.
And then other people in my company also needed to do the same thing and after a few months of occasional use, I rewrote it using C++ and Open3D. A few days of swearing this time (mainly because my C++ is a bit rusty, and Open3D’s C++ interface is a sparsely-documented back end to their main python front end).
End result though is that point clouds that took 3 minutes to process before in python now take 10 seconds, and now there’s a visualisation widget that shows the effects of the processing so you don’t have to open the cloud in another viewer to see that it was ok.
But anyway, like you said, python is good for prototyping, and when you hash out your approach and things are fairly nailed down and now you’d like some speed, jump to a compiled language and reap the benefits.
Python is also pretty good for production, provided you’re using libraries optimized in something faster. Is there a reason you didn’t use Open3D’s Python library? I’m guessing you’d get close to the same performance of the C++ code in a lot less time.
That said, if you’re doing an animation in 3D, you should probably consider a game engine. Godot w/ GDScript would probably be good enough, though you’d spend a few days learning the engine (but the next project would be way faster).
If you’re writing a performance-critical library, something compiled is often the better choice. If you’re just plugging libraries together, something like Python is probably a better use of your time since the vast majority of CPU time can generally be done in a library.
While I agree with most of what you say, I have a personal anecdote that highlights the importance of performance as a feature.
I have a friend that studies economics and uses python for his day to day. Since computer science is not his domain, he finds it difficult to optimize his code, and learning a new language (C in this case) is not really an option.
Some of his experiments take days to run, and this is becoming a major bottleneck in his workflow. Being able to write faster code without relying on C is going to have a significant impact on his research.
Of course, there are other ways to achieve similar results, for example another friend is working on DIAS a framework that optimizes pandas in the runtime. But, the point still stands, there are a tonne of researchers relying on python to get quick and dirty results, and performance plays a significant in that when the load of data is huge.
Sure, I was being mildly facetious, but pointing to a better pattern, the nature of python means it is, barring some extreme development, always going to be an order of magnitude slower than compiled. If you’re not going to write even a little C, then you need to look for already written C / FORTRAN / (SQL for data) / whatever that you can adapt to reap those benefits. Perhaps a general understanding of C and a good knowledge of what your Python is doing is enough to get a usable result from a LLM.
My coworker is a Ph.D in something domain specific, and he wrote an app to do some complex simulation. The simulation worked well on small inputs (like 10), but took minutes on larger inputs (~100), and we want to support very large inputs (1000+) but the program would get killed with out of memory errors.
I (CS background) looked over the code and pointed out two issues:
bubble sort in a hot path
allocated all working memory at the start and used 4D arrays, when 3D arrays and a 1D result array would’ve sufficed (O(n4) -> O(n3))
Both problems would have been solved had they used Python, but they used Fortran because “fast,” but it doesn’t have builtin sort or data structures. Python provides classes, sortable lists (with quicksort!), etc, so they could’ve structured their code better and avoided the architectural mistakes that caused runtime and memory to explode. Had they done that, I could’ve solved performance problems by switching lists to numpy arrays and throwing numba on the hot loops and been done in a day, but instead we spent weeks rewriting it (nobody understands Fortran, and that apparently included the original dev).
Python lets you focus on the architecture. Compiled languages often get you stuck in the weeds, especially if you don’t have a strong CS background and just hack things until it works.
I’d really like to see Rust fit in where C(++) does now for Python. I know some libraties do it (e.g. Pydantic), but it really should be more common. It should work really well with the GIL… (or the TIL or whatever the new one is)
Well, it is happening, I just don’t know how “blessed” it is by Python maintainers (i.e. are Python releases blocked by Rust binding updates?). It’s 100% possible today and there are projects that use Rust bindings, I just don’t know how that fits in with Python development vs the C++ API.
Or you could use cython, which is much easier to integrate with a python project. It is only marginally slower than Rust but a little less safe. Numpy libraries are usually the fast. Numba is a little clunky, but can also speed up code. There’s lots of options to speed up python code.
You can also use numba if you just need to accelerate one part of the app. We did that with a heavy part of the app and our naïve Python (using numpy) was about as fast as our naïve Rust, but only when wr turned on parallel processing in numba (I could’ve easily beat it with parallel Rust, but that requires extra work and wouldn’t fit as nicely into the rest of the app).
When you need speed in Python, after profiling, checking for errors, and making damn sure you actually need it, you code the slow bit in C and call it.
When you need speed in C, after profiling, checking for errors, and making damn sure you actually need it, you code the slow bit in Assembly and call it.
When you need speed in Assembly, after profiling, checking for errors, and making damn sure you actually need it, you’re screwed.
Which is not to say faster Python is unwelcome, just that IMO its focus is frameworking, prototyping or bashing out quick and perhaps dirty things that work, and that’s a damn good thing.
Generally I bash together the one-off programs in Python and if I discover that my “one off” program is actually being run 4 times a week, that’s when I look at switching to a compiled language.
Case in point: I threw together a python program that followed a trajectory in a point cloud and erased a box around the trajectory. Found a python point cloud library, swore at my code (and the library code) for a few hours, tidied up a few point clouds with it, job done.
And then other people in my company also needed to do the same thing and after a few months of occasional use, I rewrote it using C++ and Open3D. A few days of swearing this time (mainly because my C++ is a bit rusty, and Open3D’s C++ interface is a sparsely-documented back end to their main python front end).
End result though is that point clouds that took 3 minutes to process before in python now take 10 seconds, and now there’s a visualisation widget that shows the effects of the processing so you don’t have to open the cloud in another viewer to see that it was ok.
But anyway, like you said, python is good for prototyping, and when you hash out your approach and things are fairly nailed down and now you’d like some speed, jump to a compiled language and reap the benefits.
And at that point you’ll also have a better idea of the problem and solution.
Python is also pretty good for production, provided you’re using libraries optimized in something faster. Is there a reason you didn’t use Open3D’s Python library? I’m guessing you’d get close to the same performance of the C++ code in a lot less time.
That said, if you’re doing an animation in 3D, you should probably consider a game engine. Godot w/ GDScript would probably be good enough, though you’d spend a few days learning the engine (but the next project would be way faster).
If you’re writing a performance-critical library, something compiled is often the better choice. If you’re just plugging libraries together, something like Python is probably a better use of your time since the vast majority of CPU time can generally be done in a library.
While I agree with most of what you say, I have a personal anecdote that highlights the importance of performance as a feature.
I have a friend that studies economics and uses python for his day to day. Since computer science is not his domain, he finds it difficult to optimize his code, and learning a new language (C in this case) is not really an option.
Some of his experiments take days to run, and this is becoming a major bottleneck in his workflow. Being able to write faster code without relying on C is going to have a significant impact on his research.
Of course, there are other ways to achieve similar results, for example another friend is working on DIAS a framework that optimizes pandas in the runtime. But, the point still stands, there are a tonne of researchers relying on python to get quick and dirty results, and performance plays a significant in that when the load of data is huge.
Sure, I was being mildly facetious, but pointing to a better pattern, the nature of python means it is, barring some extreme development, always going to be an order of magnitude slower than compiled. If you’re not going to write even a little C, then you need to look for already written C / FORTRAN / (SQL for data) / whatever that you can adapt to reap those benefits. Perhaps a general understanding of C and a good knowledge of what your Python is doing is enough to get a usable result from a LLM.
I have an alternative anecdote.
My coworker is a Ph.D in something domain specific, and he wrote an app to do some complex simulation. The simulation worked well on small inputs (like 10), but took minutes on larger inputs (~100), and we want to support very large inputs (1000+) but the program would get killed with out of memory errors.
I (CS background) looked over the code and pointed out two issues:
Both problems would have been solved had they used Python, but they used Fortran because “fast,” but it doesn’t have builtin sort or data structures. Python provides classes, sortable lists (with quicksort!), etc, so they could’ve structured their code better and avoided the architectural mistakes that caused runtime and memory to explode. Had they done that, I could’ve solved performance problems by switching lists to numpy arrays and throwing numba on the hot loops and been done in a day, but instead we spent weeks rewriting it (nobody understands Fortran, and that apparently included the original dev).
Python lets you focus on the architecture. Compiled languages often get you stuck in the weeds, especially if you don’t have a strong CS background and just hack things until it works.
I’d really like to see Rust fit in where C(++) does now for Python. I know some libraties do it (e.g. Pydantic), but it really should be more common. It should work really well with the GIL… (or the TIL or whatever the new one is)
Sounds like an excellent idea, I’d be surprised if it isn’t happening.
Well, it is happening, I just don’t know how “blessed” it is by Python maintainers (i.e. are Python releases blocked by Rust binding updates?). It’s 100% possible today and there are projects that use Rust bindings, I just don’t know how that fits in with Python development vs the C++ API.
Or you could use cython, which is much easier to integrate with a python project. It is only marginally slower than Rust but a little less safe. Numpy libraries are usually the fast. Numba is a little clunky, but can also speed up code. There’s lots of options to speed up python code.
Yup, Cython rocks.
You can also use numba if you just need to accelerate one part of the app. We did that with a heavy part of the app and our naïve Python (using numpy) was about as fast as our naïve Rust, but only when wr turned on parallel processing in numba (I could’ve easily beat it with parallel Rust, but that requires extra work and wouldn’t fit as nicely into the rest of the app).