Blog | Ina Krapp

sfcentralities: Calculating centralities from sf objects

Tue, 10 Mar 2026 00:00:00 +0000

While I generally like to work with the sf library in R, it is unfortunate that the R ecosystem when it comes to spatial data is quite heterogeneous: Available spatial formats do not only include vector and raster data, but the structures in which they are stored are also usually heavily influenced by the design choices of authors of certain packages:

, probably the first widely used spatial package in R - at least, it is among the older ones, and , which is often treated as its successor.
, a package by the authors of “Spatial Point Patterns: Methodology and Applications with R”
is more often used for raster data, but also has capabilities to handle vector data, and is, in many ways, a successor to the package

…and then there is the option to store coordinates in an ordinary R dataframe with columns named ‘Longitude’ and ‘Latitude’ or ‘X’ and ‘Y’ and ignore spatial packages entirely.

These are just the packages and ways to work with spatial data I encountered most frequently. And spatio-temporal analysis is also growing more common (anyone interested in this field: I highly recommend taking a look at the package).

I think it is an unfortunate situation for beginners who aim to perform spatial analysis in R. Even if you have some experience, especially if it is in one package, working with another package can feel like starting from zero again. Many authors have valid reasons for their design choices, and there is a growing awareness of the issue; many packages also contain functions to transform data from one format to the other. But sometimes, I find functions implemented in packages that do not see themselves as ‘spatial’ to begin with.

For example, when I tried to find out how to calculate the geometric median in R, the R package that appeared first in the search results was - Practical Numerical Math Functions. The package is impressive for the breadth of functions it covers, but this result also shows how different the contexts are in which functions are used. The geometric median can be used by geographers to find a central point on the map, but is also used, for example, in principal component analysis - which is an entirely different story. There are even more packages in R which offer functions for the calculation of geometric medians - , for example, which is particularly optimized for high-dimensional, large datasets and very fast due to its reliance on C++. So many users of the geometric median do not analyse spatial data.

But I did, for a project, and this is a part of where (more precisely: the function st_geo_median) comes from. It takes sf objects as input, it gives sf objects as output, it is not reinventing the wheel, but beginner-friendly and easy to use - at least I hope so. But it is not entirely a package purely written with convenience in mind: Since spatial data can be in projected or geographic coordinates, using purely-number based implementations like pracma or Gmedian, which can not properly handle longitude and latitude values can give wrong results - sfcentralities will give an error if the user attempts it.

Why can other packages give wrong results? Because distances in the longitude-latitude-system are not constant: One degree corresponds to a different distance in meters depending on if you are at the equator or at the poles. So, despite the sometimes complex implementations of these packages, there are good reasons to use spatial packages for spatial data.

sfcentralities is also a bit of an attempt to bridge the gap between sf and the package. dodgr is one of the packages that just show how powerful R can be. dodgr stands for ‘Distances on Directed Graphs’, and while it can take sf objects as input and even offers a function to download data from OpenStreetMap in sf format to use for analysis, it also heavily builds on its own data format, . It allows very fast and precise distance and time calculations even in complex street networks, since it is able to take into account different modes of transport, one-way streets and other factors that go beyond a simple ‘distance in kilometers/miles’ measure.

I have been using it to calculate a similar measure to the geometric median - a measure which minimizes the sum of distances to points - along a street network. This is not really equivalent to the geometric median because it does not necessarily fulfill a global optimality condition - I can say that the geometric median is the point which minimizes the sum of distances to all points in a set since the geometric median is calculated using Euclidean (straight-line) distances. For a network distance, I can only say that a certain point I evaluated has a higher closeness than other points I evaluated. This is the closeness centrality, and in sfcentralities, it is implemented in the function st_closeness_centrality.

Wisp: A locally running version of Whisper

Sat, 13 Apr 2024 00:00:00 +0000

Whisper is a transcription software that allows to turn audio files into text. I created a locally running version of it, Wisp, with the aim to give it a simple, intutive user interface. My project can be found here:

Whisper itself has been developed by OpenAI, which is also the company behind ChatGPT and several other Artificial Intelligence programs. You can try it out here: ‚ ‘.

Unlike ChatGPT, Whisper does not have a user interface designed by OpenAI. Its demos, as you might have seen if you tried it out above, often are used by many people at the same time. Since they all send their requests to the same computer, people may have to wait a very long time before they receive the text. Alternatively, Whisper can be run locally, using Python, but for anyone who does not know how to use a programming language, this is not an option.

So the aim of my project was to make Whisper easy to use for anyone, on their own computer.

Wisp is supposed to run without internet connection. Any user runs it on their computer, meaning that the users won‘t have to wait before the program finished the text of someone else. Since it has a graphical user interface, it is easy to use for anyone who is familiar with standard office software like Microsoft Office.

Like with many of my other projects, I learnt a lot in the process of building this program. Before I started, I had no experience in working with audio data and very little with building a graphical user interface in Python. It was also my first time I turned a Python program into an executable windows program.

An ARIMA model of the global average temperature

Sat, 12 Aug 2023 00:00:00 +0000

I wrote an ARIMA model to predict the average global temperature.

I would not call it a climate model because it only covers one of the many aspects of the climate. Of course, it is much more limited compared to those developed by experts in the field. Still, writing it taught me a lot about forecasting of time series.

Today, I would do some things different. In particular, I probably would discourage people who use ARIMA models from interpolating missing data points. The ARIMA model can still be used when some data is missing. I used interpolation originally because I also experimented with ETS models, who require a time series without gaps. But the ETS model is not in the published version of the code because its predictions were not very good.

The code (with extensive commentary) can be downloaded . It is written in a quarto document and should run on any relatively recent version of R and Rstudio. The ‘Global_Temperature.txt’ and ‘merged_ice_core_yearly.csv’ files contain the data the model uses, so they have to be downloaded into the same folder as well to run the code. For anyone who just wants to take a look at the results and doesn‘t want to run or modify the code themselves, here is the .

Edit from October 19th 2023: I uploaded a version that can be used for Workshops in Germany. It is in the subfolder ‘Workshop’.