Offical data to R without Excel

· 1213 words · 6 minute read

Hey there! API calls are pretty common these days, and you’d expect that government offices have them all sorted out. Lucky us, today, we can to talk about how to connect to the Statistical Office of the Republic of Slovenia (SURS) database via an API call.

A little bit about PXWEB API

The SURS team was nice enough to use PXWEB, which is an API structure developed by Statistics Sweden and other national statistical institutions (NSIs) to disseminate public statistics in a structured way. This API allows you to download and use data from statistical agencies without using a web browser, directly over HTTP/HTTPS. You can see which other statistical offices use the same API structure here,

I won’t go too in-depth into the whole package, called pxweb. Instead, I’ll just present my use case.

Selecting the data

So you have a dataset that you find on the SURS website and you want to use it in your own analysis. For our case, we used something current: the amount of electricity produced in Slovenia in previous months. You can play with the data here

After you select the data you want to see, you have a tab called About table where you’ll find a bucket API query for that table. When you click on it, you’ll find an API URL and a JSON query. The first part (the URL) needs to be copied into our R script, and we only need to quickly go through the query. We need to remember the names of the variables in this dataset.

Some coding

#install.packages("pxweb")
# useful tip, in case of using ščćđ
#Sys.setlocale(locale="UTF-8") 

# our library
library(pxweb)

api_dataset <-
  "https://pxweb.stat.si:443/SiStatData/api/v1/en/Data/1817601S.px"
  

px_api_call <- pxweb_get(api_dataset)
# here we calling this URL request, that we get in previous chapther.

Ok, now we’ve saved our API call. Now we need to add a query to specify what kind of data we want. Our variables are MERITVE in MESEC. Now you have two options. You can specify in detail what kind of data you want, or you can download all of the data and then use some data manipulation to remove the parts you don’t need.

If you look carefully at the URL link, you’ll notice that it contains “en”. If you want English-named variables in your data, you can use this setting, but you can change it to “sl” (right in the URL) to get Slovenian names instead. However, be careful to use Slovenian names for the variables you use in your query.

query_list <- 
  list("MERITVE"=c("*"), # Use "*" to select all
       "MESEC"=c("*"))

px_query <- pxweb_query(query_list)
#translate into px syntax

Now, you can validate your query, and if you don’t have any errors, you are ready to go.

# if you don't get an error, it seems good to go.
pxweb_validate_query_with_metadata(px_query, px_api_call)

Now you can execute the query with function pxweb_get and transform to data.frame or any other data frame you like.

data_px_api <- pxweb_get(url = api_dataset,query = px_query)

final_data <- as.data.frame(data_px_api, 
                            column.name.type = "text", 
                            variable.value.type = "text")

And here it is, our data are here:

Ok, we can visualize a little bit