Friday, December 19, 2014

Should you wait or should you go



Visualizing the usage of Barcelona's public bicycle system looks cool and it might actually give you some general idea about when and where to find a bike. But, imagine you find yourself at a station with no bikes. I certainly have, and I ponder over the same question every time: "Should I wait for a while until a bike arrives, or would it be faster if just walk to the next Bicing station?"

So, after almost 3 months of scraping Bicing's data, 944 lines of code and 2 weeks of non-stop parallel computing on 6 cores, I finally have the answer.

I wanted to share it with you, but I thought I'd rather let you get it yourself. So, I designed this little tool below. I call it BUBADE (Barcelona's Urban Bicycle Anticipation Decision Engine).  You only need to type or select the street where your empty station is. The tool will then tell you if you are better off waiting or walking.
For the geeks among you - there's a table with details. It tells you what's the chance that, by the average time it takes to walk to the 4 closest stations, a bike would have arrived to where you are. It also gives you the probability that there would be bikes on each of the 4 closest stations by the time you arrive (given the condition that there are no bikes on your station).

Bear in mind that these chances are averages. I would not consider them if there's an exceptional event in the neighbourhood, like Fiesta de GrĂ cia or Poblenou Craft Beer Festival.



































If you want to learn more about the methodology I used - stay tuned! I will soon share my R code.

Tuesday, November 11, 2014

On Culture 2

I want to follow up on my previous post on culture and elaborate on the topic through some further analysis.
So, there are two questions that come naturally after plotting the 6 cultural dimensions of Hofstede's framework:

  1. Do some dimensions correlate with each other. In other words, do countries that score high one one dimension also score high on another.
  2. Given these dimensions, which countries are more similar to one another.



Correlation Analysis


The first question can easily be answered with a simple Pearson Correlation analysis. I plot the results in the correlation matrix below.



Obviously, the strongest correlation is between Power Distance and Individualism. In this case the relation is negative, meaning that the higher a society scores on the Power Distance index, the lower it will score on the Individualism index (i.e. it will be a more collective society) and vice versa. Although this isn't exactly news to most people, it does leave several questions open: "Why do more collective societies do not question power legitimacy as much as individualistic societies?", "Is it the case that power left unchallenged would result in collectivism, or is it that collective societal structures would require a strong unquestioned power". Let me know what do you think in the comments below.

The second strongest correlation is between Long-term Orientation and Indulgence. Again, the relation is negative, meaning that the more long-term oriented a society is, the less accepting it will be to gratification from satisfying natural human drives (i.e. it will be more restraint). This makes a lot of sense - by definition long-term orientation will manifest itself into behaviours such as saving and education oneself for the future. These are behaviours that naturally require restraining yourself from immediate gratification to reach a goal in the future - for example, when you choose to stay home and study for an exam instead of going out to party. The fact that the link is weaker suggests that there is more to it than that and indulgence does not necessarily result in short-termism.


Cluster Analysis


The second question can be answered through conducting Cluster Analysis. I have always been a bit sceptical of the K-means method, so I go for the Hierarchical clustering using Ward's agglomeration method. I choose to cut the dendogram right where the red line is, so I end up with 5 clusters.






I've also ran a summary statistics, where I calculate the average of each dimension of Hofstede's model per cluster. This allows one to easily see that:
  1. Cluster one is comprised of countries having the lowest average Power Distance, Masculinity and Uncertainty Avoidance and highest average Individualism. 
  2. Cluster two is comprised of the most collective societies with the shortest-term orientation, but with the highest level of indulgence
  3. Cluster three is a more moderate segment of societies with relatively high Uncertainty Avoidance
  4. Cluster four is comprised of the most long-term oriented and the most masculine societies. At the same time, it feels the strongest anxiety from the uncertain and is the second most restrained segment (i.e. second lowest Indulgence)
  5. Cluster five has the highest Power Distance the highest Restraint. While it is a collective society, it is the second most long-term oriented.












As before, here's my code:




# For the Correlation Matrix I use the Rattle User Interface
library(rattle)
rattle()


# I subset the datta, omitting observations where data is not available
hofs.db_noNA<-na.omit(hofs.db[2:8])


library(stats)

#Create the distance matrix
db.to_clust<-dist(hofs.db_noNA[2:7], method = "euclidean")

# Create the clusters
hclust <- hclust(db.to_clust, method="ward.D")

# Print the dendogram with the country labels
plot(hclust, labels=hofs.db_noNA$country)

# Plot The dendogram with 5 clusters.
# As Ithis graph is not particularly pretty, 
# I ended up using the one above, adding the additional elements manually
rect.hclust(hclust, k=5, border="red")


# Cut the hclust class into 5 clusters
hofs.groups2<-cutree(hclust, k=5)

# Add assign the clusters to the original dataframe
hofs.db_noNA$clust<-hofs.groups2

# Calculate average for each feature by cluster
aggregate(.  ~ clust, data=hofs.db_noNA[2:8], FUN = function(x) mean=round(mean(x),0))

Sunday, November 2, 2014

On Culture


Introduction


As someone who works in the field of consumer behaviour, and  has been living as an expat in the past 5 years, I have been keenly interested in the topic of culture.
From a professional perspective, understanding how background affects the decisions of a certain cohort of customers, has an obvious implications for the marketing strategies a company employs in managing its B2C relationships. And on a more personal level - knowing how culture affects the decisions of your foreigner friend, can help you in better empathizing with that friend, and thus make you a better person.
Accounting for the differences between cultures could be a Sisyphean task, though, as there are numerous factors that will vary in importance depending on the context. Hence, a popular approach among researchers on the topic is to create a framework with a limited number of cultural dimensions for a given context of use. A well known framework of this kind is Hofstede's Cultural Dimensions Theory. While it has its fair share of criticism, it is generally considered as one of the most comprehensive models by those studying culture for business applications.

Hofstede's website has a good tool for comparing two countries and I give it additional points for using my favourite Highcharts library. Nevertheless, one thing that lacks, and would be interesting for overall comparison purposes, would be to visualize each of the 6 dimensions on the world map.
So, below I will briefly summarize what each dimension is about and I will post a visualization each one spreads on the world map. At the end I will provide the code I used to generate it.

One important note - if your mobile device does not support Java, the graphs below won't load.


Power Distance


The Power Distance dimension expresses the degree to which the members of a society accept the unequal distribution of power. People in countries with low power distance (light green) would be aiming at equalizing the distribution of power and demanding justification for inequalities. Whereas individuals in countries with high power distance (dark green) will be more likely to be accepting of hierarchical order and would be less likely to require justification.
Here's how countries look like on this dimension.







Individualism


The definition of this dimension is rather straight forward - in more individualistic societies (dark green) members are expected to take care of themselves and their immediate families, while personal achievements and individual rights are essential. On the other hand, in more collective societies (light green) individuals would usually have a large extended family, and would act as a part of a larger group.








Masculinity


Masculine societies (dark green) value assertiveness, competitiveness, ambition to power and the difference between gender roles will be more pronounced. In contrast, feminine societies will value modesty and cooperation.








Uncertainty Avoidance


As the name clearly suggests, this dimension measures to what extend a society is willing to tolerate uncertainty. Countries with high uncertainty avoidance index (dark green) will strive to minimize the anxiety from uncertainty by maintaining a governing code and by not tolerating unconventional behaviours. Societies with lower uncertainty avoidance tend to be more pragmatic and accepting of the changing nature of life. They tend to have fewer rules and encourage individuals do discover their own truth.








Long-term Orientation


Again, as the name clearly suggest, societies scoring high on this dimension (dark green) will encourage future orientation that will manifest itself in behaviours related to persistence, saving, and educating oneself to prepare for the future. Countries with more short-term orientation promote values related to the past and the present -  honouring traditions, fulfilling one's social obligations, reciprocation and so on.







Restraint  vs. Indulgence


The final dimension of Hofstede's framework measures whether a society is accepting of gratification coming from the exercise of natural human drives or if the society tries to abolish gratification of needs through strict social norms. In more restrained societies (light green) individuals will expect material rewards for a job well done and will pay attention to status items. In contrast, societies with high indulgence index (dark green) would not be so easily motivated by material rewards and items need to fulfil a purpose rather serve as a status symbol.







Code


I scraped the data from various sources, and I used Hofstede's website to manually filled in some of values I was missing. So, I can't provide you with the source of my data, but if you want to play around with it - you can easily scrape it from the java charts above.

I generated the charts with Google vis library in R. Here's my code


# First, load your data

hofs.db<-read.csv("C://Type-a-path-here//HofstedeDimensions.csv", sep=";")

hofs.db$ctr<-as.character(hofs.db$ctr)
hofs.db$country<-as.character(hofs.db$country)



# Rename country names that are not recognized by the googleVis library


hofs.db[2][hofs.db[2]=='Bosnia']<-'Bosnia i Hercegovina'
hofs.db[2][hofs.db[2]=='Czech Rep']<-'Czech Republic'
hofs.db[2][hofs.db[2]=='Dominican Rep']<-'Dominican Republic'
hofs.db[2][hofs.db[2]=='Great Britain']<-'United Kingdom'
hofs.db[2][hofs.db[2]=='Korea South']<-'South Korea'
hofs.db[2][hofs.db[2]=='Kyrgyz Rep']<-'Kyrgyzstan'
hofs.db[2][hofs.db[2]=='Macedonia Rep']<-'Macedonia'
hofs.db[2][hofs.db[2]=='U.S.A.']<-'United States'


# Give the columns a nice  name

colnames(hofs.db)[3]<-"Power Distance"
colnames(hofs.db)[4]<-"Individualism"
colnames(hofs.db)[5]<-"Masculinity"
colnames(hofs.db)[6]<-"Uncertainty Avoidance"
colnames(hofs.db)[7]<-"Long-Term Orientation"
colnames(hofs.db)[8]<-"RestraintRestraint vs Indulgence"


#Create the plot. To create one per each dimension you'd need to manually change the numvar below

library(googleVis)

G1 <- gvisGeoMap(hofs.db,locationvar='country', numvar='Power Distance'
                 #, hovervar='Power Distance'
                 ,options=list(dataMode='regions'
                               ,width=1012
                               ,height=649
                               ,showZoomOut=TRUE
                               #,colors=c('0x80FF51',0xFF5151)
                 )
)

# Remove caption
G1$html$caption<-''
#REmove the footer
G1$html$footer <-""

print(G1, 'chart')





Blog Archive