William Falcon wants AI practitioners to spend more time on model development, and less time on engineering. PyTorch Lightning is a lightweight PyTorch wrapper for high-performance AI research that lets you train on multiple-GPUs, TPUs, CPUs and even in 16-bit precision without changing your code! In this episode, we dig deep into Lightning, how it works, and what it is enabling. William also discusses the Grid AI platform (built on top of PyTorch Lightning). This platform lets you seamlessly train 100s of Machine Learning models on the cloud from your laptop.
Mark Van Doren said, “the art of teaching is the art of assisting discovery.” I saw that play out in this classroom using open source tools. More students need opportunities like this to help them gain a quality education. The Raspberry Pi 400 is a great form factor for teaching and learning.
Such a cool program that’d be easy to reproduce in your local library.
Today we’re talking Brett Cannon. Brett is Dev Manager of the Python Extension for VS Code, Python Steering Council Member, and core team member for Python. He recently shared a blog post The social contract of open source, so we invited Brett to join us for Maintainer Week to discuss this topic in detail.
Thank a maintainer on us! We’re printing a limited run t-shirt that’s free for maintainers, and all you gotta do is thank them, today!
A nice primer on the many aspects of building full-text search, such as: data preparation, indexing, searching, term frequency, and computing relevance. It’s amazing what 150 lines of code can get done…
Mongita is a lightweight embedded document database that implements a commonly-used subset of the MongoDB/PyMongo interface. Mongita differs from MongoDB in that instead of being a server, Mongita is a self-contained Python library. Mongita can be configured to store its documents either on disk or in memory.
I can’t speak to the implementation, but I love the idea behind this project. Already know and love Mongo? Here’s a way to use it in an embedded fashion with all of the advantages that come with such an architecture…
Supports checking Hy-Vee, Cosentino’s stores (KC), Ball’s stores (KC), Rapid Test KC, and locations checked by VaccineSpotter (including Walmart, Walgreens, CVS, Costco).
Supports sending notifications to Slack, Discord, Microsoft Teams, Twilio, and Twitter.
Notifications are sent when a location has appointments. No more notifications are sent for that location until it becomes unavailable again.
Why do something like this? For the fun of it, mostly. Definitely not for this reason:
By creating a crypto trading bot that buys bitcoin every time the Tesla boss tweets about it you can rest assured that you are going to catch a VIP seat on the rocket that will slingshot right past the moon and make its way directly to Mars, where Elon spends most of the summer months due to its cold weather and dry climate.
Lulz aside, I love posts like this because they demonstrate how someone tied together a bunch of disparate things (Twitter API, trading API, regular expressions, etc.) to accomplish a real thing, no matter how silly/foolish that real thing is.
Also check out part 2 where he adds sentiment analysis. (Although, it’s hard for me –a human– to decipher Elon Musk’s tweets, so the results of said analysis are probably no better than flipping a coin.)
Print-based debugging is popular because it’s so quick and easy to put into practice. But what if we took Python’s built-in
- No need to first install the command on the other machine.
- Reference local files and paths like you would normally.
- Works across wildly different Linux distributions, like Ubuntu and Alpine.
The examples are showing
ffmpeg use (which is a great one), but how about
Here’s a pretty useful idea for library authors and their users: there are better ways to test your code!
I give three examples of how user projects can be self-tested without actually writing any real test cases by the end-user. One is hypothetical about
django and two examples are real and working: featuring
dry-python/returns. A brief example with
import deal @deal.pre(lambda a, b: a >= 0 and b >= 0) @deal.raises(ZeroDivisionError) # this function can raise if `b=0`, it is ok def div(a: int, b: int) -> float: if a > 50: # Custom, in real life this would be a bug in our logic: raise Exception('Oh no! Bug happened!') return a / b
This bug can be automatically found by writing a single line of test code:
test_div = deal.cases(div). As easy as it gets! From this article you will learn:
- How to use property-based testing on the next level
- How a simple decorator
@deal.pre(lambda a, b: a >= 0 and b >= 0)can help you to generate hundreds of test cases with almost no effort
- What “Monad laws as values” is all about and how
dry-python/returnshelps its users to build their own monads
I really like this idea! And I would appreciate your feedback on it.
SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.
The goal is to create a single, flexible, and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies, including systems for speech recognition, speaker recognition, speech enhancement, multi-microphone signal processing and many others.
Currently in beta.
If you’re adventurous and you want to learn to distinguish between couch #1 and couch #2 (i.e. 2 meters apart), it is the most robust when you switch locations and train in turn. E.g. first in Spot A, then in Spot B then start again with A. Doing this in spot A, then spot B and then immediately using “predict” will yield spot B as an answer usually. No worries, the effect of this temporal overfitting disappears over time. And, in fact, this is only a real concern for the very short distances. Just take a sample after some time in both locations and it should become very robust.
Dropbox Engineering tells the tale of their new SOA:
The majority of software developers at Dropbox contribute to server-side backend code, and all server side development takes place in our server monorepo. We mostly use Python for our server-side product development, with more than 3 million lines of code belonging to our monolithic Python server.
It works, but we realized the monolith was also holding us back as we grew.
This is an excellent, deep re-telling of their goals, decisions, setbacks, and progress. Here’s the major takeaway, if you don’t have time for a #longread:
The single most important takeaway from this multi-year effort is that well-thought-out code composition, early in a project’s lifetime, is essential. Otherwise, technical debt and code complexity compounds very quickly.
Graphtage is a commandline utility and underlying library for semantically comparing and merging tree-like structures, such as JSON, XML, HTML, YAML, plist, and CSS files. Its name is a portmanteau of “graph” and “graftage”—the latter being the horticultural practice of joining two trees together such that they grow as one.
I can’t imagine why anybody would be building an awesome stock market terminal right now with loads of features such as stock discovery, market sentiment analysis, research tools, FA, TA, DD, and more.
Maybe they just like the stock? 😏
Depending how you compile Python, you can get significant differences in performance—and that is reflected in performance differences in real-world versions of Python, like Ubuntu and the official Docker images.
You may recall spaCy from this episode of Practical AI with its creators. If not, now’s a great time to introduce yourself to the project. 3.0 looks like a fantastic new release of the wildly popular NLP library. The list of new and improved things is too long for me to reproduce here, so go check it out for yourself.
This has been in the works for ~2 years now and finally dropped on January 23rd, 2021. It’s amazing how much work it takes to upgrade a community as large and broadly-interested as Python’s.
Getting the de facto tool for installing packages off Python 2 seems like a pretty moment in that effort to me, but I’m only a casual observer/fan of the language. I’d love to from folks who use Python on the daily.. Is this a big deal?
Superset can query data from any SQL-speaking datastore or data engine (e.g. Presto or Athena) that has a Python DB-API driver and a SQLAlchemy dialect.
This has been around long enough to be picked up by the Apache Foundation, but somehow it’s avoided my radar until today. The visualizations you can achieve with it are impressive, to say the least.
Guido van Rossum:
I decided that retirement was boring and have joined the Developer Division at Microsoft. To do what? Too many options to say! But it’ll make using Python better for sure (and not just on Windows :-). There’s lots of open source here. Watch this space.
Late last year Guido left Dropbox to head into retirement. Apparently “retirement was boring.” I’m curious to see how coming out of retirement changes things at the steering level of Python.
We talked mid last year with Brett Cannon about Python’s new governance and core team. I don’t recall their plan accounting for the possibility for their BDFL to come back from retirement. 😱
I’m sure whatever is to come for Python with Guido being back, it’ll be a net positive.
- What Higher Kinded Types (HKTs) are and why they are useful
- How they are implemented and what limitations there are
- How can you use them in your own projects
Without further ado, let’s talk about typing!
Craig Kerstiens told me about this on our recent Postgres episode of The Changelog and my jaw about dropped out of my mouth.
… earlier today I was starting to wonder why couldn’t I do more machine learning directly inside [Postgres]. Yeah, there is madlib, but what if I wanted to write my own recommendation engine? So I set out on a total detour of a few hours and lo and behold, I can probably do a lot more of this in Postgres than I realized before. What follows is a quick walkthrough of getting a recommendation engine setup directly inside Postgres.
Craig doesn’t necessarily suggest you put this kind of solution in production, but he doesn’t come out and say don’t do it either. 😉
Mimesis… provides data for a variety of purposes in a variety of languages. The fake data could be used to populate a testing database, create fake API endpoints, create JSON and XML files of arbitrary structure, anonymize data taken from production and etc.
Data generators like Mimesis are fun to use (and I imagine fun to code as well):
>>> from mimesis import Person >>> person = Person('en') >>> person.full_name() 'Brande Sears' >>> person.email(domains=['mimesis.name']) 'firstname.lastname@example.org' >>> person.email(domains=['mimesis.name'], unique=True) 'email@example.com' >>> person.telephone(mask='1-4##-8##-5##3') '1-436-896-5213'
This is not a tutorial on using Git! To follow along I advise that you have working knowledge of Git. If you’re a newcomer to Git, this tutorial is probably not the best place to start your Git journey. I suggest coming back here after you’ve used Git a bit and you’re comfortable with making commits, branching, merging, pushing and pulling.