Friday, November 21, 2014

Visualizing Barcelona's Public Bikes Usage

Did you know that Spain has the greatest number of bicycle sharing systems in the world? Having a high demand for these makes a lot a sense - the mild winter leaves one no bad-weather-excuses for not biking; and in the summer the bicycle is the most pleasant transport method for going to the beach. But if you have ever tried to go to the beach on the bici (that's how the public bikes are called in Barcelona) on a lovely Sunday morning, you've probably stumbled on your closest station being empty. So you walk under the scorching Spanish sun to the next station, and quite possibly - to the one after that, until you can find a bicycle. Once you finally arrive to the beach, all the stations are now full, and there's a line of people waiting to return their bici.

Each time this happened to me, it got me thinking that there should be a way to forecast the demand in order to improve the supply. Fortunately, the guys at CityBikes have provided an API with the momentary availability of bikes. So, I scraped a several days worth of data, averaged the bikes per station and produced the following image.

The white dots are the stations that are on average (almost) full and the reds are (again almost) always empty.
Great, but this doesn't help at all with the forecast, because even though the station which is closest to my apartment is in yellow (meaning that on average there are bikes), it is always empty in the morning and always full at night. The obvious solution would be to average the bikes on each station by time of the day and then create an animation with that. And I did. Here's the result:


So here are few patterns that can be observed:

  • The stations around Pla├ža Canalunya and Las Ramlas get packed after 7:00 and empty after 21:00 h.
  • Raval is generally well supplied with bikes, except between 7:00 and 13:00
  • Sant Gervasi, Eixample Esquerra and Eixample Dreta are chronically left without bicycles.
  • In Sant Marti, the area on the right from Marina and around Poblenou, (which is famous for both lots of start-ups offices and night clubs) has bikes except between 16:00 and 21:00.
  • And finally, the area around Navas and Clot empties around 8:00 and fills around 18:00. That's why I can never find a bike in the morning, and can never return one at night.

At this level of aggregation of the data, the patterns are not very clear and look rather chaotic. If the Ayuntamiento de Barcelona is willing to make the public bicycles' data public, it won't be too hard to create a visual tracing of the traffic patterns like this one from The Guardian. That will allow one to construct a predictive model that will inform the bicing servicing trucks where and when bikes are needed.

If you liked the post, or even if you did not like it - let me know in the comments below.
If you want to try it yourself, here's my R code:


# Scrape the data
raw_data <- fromJSON(getURL(""))

# Convert the JSON into a list in R
new_data <-"rbind",raw_data)

# At this point I save the table it on the hard drive 

write.table(raw_data, file="C://Bici//bici.csv", sep = ";",
            row.names = FALSE)

# I automate this part to run every x minutes in order to get several days worth of data

#### Load the data, remove any duplicates

bici.df = read.csv(file="C://Bici//bici.csv", sep=";",header=TRUE);



# Extract Hours and minutes from the timestamp

bici.df$hour<-strftime(as.POSIXlt(bici.df$timestamp, format="%Y-%m-%dT%H:%M", tz="UTC"), format="%H")
bici.df$minutes<-strftime(as.POSIXlt(bici.df$timestamp, format="%Y-%m-%dT%H:%M", tz="UTC"), format="%M")

# Format to numerical; round minutes to nearest 10
bici.df$minutes<-round(as.integer(bici.df$minutes), -1)

# This fixes results such as 00:60 to 01:00
for (i in 1:nrow(bici.df)) {
  if (bici.df$minutes[i] == 60) {

# Aggregate

bici.mean<-aggregate(bikes ~ station_size+lat+lng+hour+minutes, bici.df , mean)

bici.mean <- bici.mean[order(bici.mean$hour, bici.mean$minutes),] 


bici.mean$lat<-bici.mean$lat/1000000 <- mean(bici.mean$lat)
m.lon <- mean(bici.mean$lng)
map <- GetMap(center=c(, lon=m.lon ), zoom=13, size = c(640, 640))

# Create the HTML animation


for (i in 1:nrow(bici.mean.time)) {
  bici.mean.s<-subset(bici.mean, hour==bici.mean.time[i,1] & minutes==bici.mean.time[i,2])

  coords <- LatLon2XY.centered(map, bici.mean.s$lat, bici.mean.s$lng, 13)
  coords <- data.frame(coords)
  k2 <- kde2d(coords$newX, coords$newY, n=500)

  # Create exponential transparency vector
  alpha <-, 0.95, length.out=100)
  alpha <- exp(alpha^6-1)
  cols<-rev(colorRampPalette(brewer.pal(8, 'RdYlGn'))(100))
  cols2 <- AddAlpha( cols, alpha)

  k2z<-cbind(coords, bici.mean.s$bikes, bici.mean.s$station_size)

  # Define colour palette to be used
  plotclr <- heat.colors(length(unique(k2z$bikes)), alpha = 1)

  # Define colour intervals and colour code variable for plotting
  class <- classIntervals(k2z$bikes, length(unique(k2z$bikes)), style = "pretty")
  colcode <- findColours(class, plotclr)

  # Plot
  PlotOnStaticMap(map, size=c(640,640))
  points(coords$newX, coords$newY, col= colcode, pch=16
       , cex=1
  legend("bottomright", legend=c("Empty","Half Full","Full"),  fill=c("red","gold","white"), border="black",title = "Station is...")
  legend("topleft", legend=paste("Time: ",unique(bici.mean.s$hour), ":",ifelse(unique(bici.mean.s$minutes)==0,"00",unique(bici.mean.s$minutes)), sep=""), border="black", pch=NA)


}, = "img", imgdir = "bici", htmlfile = "index.html", 
outdir = getwd(), autobrowse = FALSE, ani.height = 640, ani.width = 640, 
verbose = FALSE, autoplay = TRUE, title = "Bicycle Availability")

No comments :

Post a Comment

Blog Archive