If you’ve worked with OSM data before, you know it’s not the easiest to extract. OSM data can be huge, and finding performant solutions for what you want to analyze is often a challenge. PyrOSM is a package that makes the process of reading in and working with OSM data much more efficient. How? Well, PyrOSM is build on Cython (C Python) and it uses faster libraries for deserializing OSM data as well as smaller optimizations like numpy arrays which allows it to process data fast. Especially if you’ve used OSMnx before (for very similar usecases), you know that large datasets take a very long time to load into memory, which is where PyrOSM can help you work with them. Let’s get into what this library can do!
🌎 PBF Data
Let’s talk a bit about the specific file format that OSM data comes in. PBF stands for “Protocolbuffer Binary Format” and it is very efficient for working with OSM data is stored. OSM data is organized in “fileblocks”, which are groups of data that can be independently encoded or decoded. Fileblocks contain PrimitiveGroups, which in turn include thousands of OSM entities, like nodes, ways and relations.
The data can be scaled according to the user’s desired level of granularity. For instance, the current OSM database’s resolution is around ~1 cm. In fact, if you wanted, you could download the entirety of Open Street Maps data into one file, known as Planet (around 1000 Gb of data)!
👩💻 PyrOSM Basics: reading in datasets
PyrOSM is a package that reads in Open Street Map’s PBF data based on two main data distributors: Geofabrik (world and country-level data) and BBBike (city-level data). The package allows the user to access many types of features:
- Buildings, POIs (points of interest), Land Use
- Street Networks
- Custom Filters
- exporting as networks
- and more!
There are 235 cities across the world currently supported by BBBike, and you can get access to the full list easily by calling the “sources.cities.available” method. Getting started is easy enough, you simply initialize an OSM reader object and load in the data you want:
From this point, you would need to be using the OSM object to interact with the Berkeley data. Now let’s get the Berkeley street network for driving:
Printing out the actual street_network object shows it is stored in a GeoPandas GeoDataFrame with all the OSM attributes like length, highway, maxspeed etc., which can be very handy for further analysis.
Side Note: BBBikes (the source provider of this data) has many more data formats of different sizes, including Organic Maps OSM, Garmin OSM or SVG Mapnik depending on what your use case is.
🔍 Better Filtering
The results of the data loading above include all of Berkeley’s data and in fact even data from the cities neighboring it, which is not ideal. What if you want a much smaller or more specific area? That’s where using a bounding box comes in. To make a bounding box you can either:
- Manually specify a list of 4 coordinates in the format of [minx, miny, maxx, maxy]
- pass in Shapely geometries (e.g a LineString or Multipolygon)
To find bounding box coordinates, I typically use this bbox finder website that lets you make rectangles and then copy the coordinates. Here’s how to bound the area around UC Berkeley’s campus and get its walking network:
🎯 Exporting and Working with Graphs
Another good thing about PyrOSM is how it allows for network processing and connecting to other network analysis libraries. In addition to saving street networks as geodataframes, PyrOSM lets you extract nodes and edges by storing them in 2 separate dataframes. Here’s the nodes one:
If you have these graph representations, it’s very easy to save them in various formats: OSMnx, igraph and Pandana and work with them there.
💭 Parting Thoughts
This was a short summary of what pyrosm can do for you in your geospatial work! I touched on some methods that can be very useful, like downloading specific datasets from an area, or through bounding the area of interest and also how this relates to other libraries. I think the best things about pyrosm is exactly this: the fact it bridges the gap between huge OSM datasets and the engineering or analytics questions you can answer with it.
Thanks for reading!