Pages Navigation Menu

Applying Sentiment Analysis to Star Wars: The Force Awakens

Posted by on Jan 20, 2016 in Blog, Data Visualization, Datamining, Geolocation and Psychogeography | Comments Off on Applying Sentiment Analysis to Star Wars: The Force Awakens

One of the more influential sites for data scientists, KDNuggets recently published a case study showing how sentiment analysis could be applied to track the reaction around a film’s early release cycle.  In this case, the film was the 2015 holiday blockbuster Star Wars: The Force Awakens.

10 milliostarwarsSA-1n tweets were collected through the Twitter API, between 12/4/15 and 12/29/15, with the release date on 12/17/15.  About 2.5% contained geolocation data either in form of direct coordinates or human readable location (e.g. New York). The researchers said “…the first thing we looked at was the frequency of Star Wars related tweets in time. It is clearly visible that most of the tweets came from US and UK, which can be easily explained by popularity of Twitter itself in these countries. Next thing to see is the periodicity of day and night, where people tweet more at night than during the day. Also the timezone shift is clearly visible.  More interestingly, we can see the build up before the release, as the number of tweets is increasing for a few days before the world premiere and sky rocketing on this day…”

starwarsSA-2Each tweet was assigned a score between -1 and +1 (-1 being highly negative, +1 highly positive). Results were plotted in a hexbin map, visualizing global sentiment and aggregating by mean within the cell.  Interestingly, average sentiment shows a steady decline as the time passes. There is an observable dip on the day of world premiere but “sentiments keep steadily low the whole time.” The researchers make several interesting observations concerning the results.  Since worldwide interest in the film, at least as reported in the media, approached general hysteria, why doesn’t the Twitter analysis parallel this?

One possible explanation is the inherent sampling bias when working with social network data.  After all, data is derived only from those who voluntarily decide to share. These are usually the ones with stronger opinions – either highly positive or negative, producing a somewhat polarizing effect.  Next,  sentiment analysis is constrained by the modeling methods and tools available for Natural Language Processing (NLP), and one of these constraints is that the algorithms require a data corpus in the English language.  Sentiment analysis that proposes a global sampling plan will necessarily have gaps in its dataset, since non-English texts will be omitted from the analysis.

Read More

Anonymous data may still not be anonymous enough

Posted by on Mar 15, 2015 in Blog, Datamining, Emerging Science and Technology, Technology and Privacy | Comments Off on Anonymous data may still not be anonymous enough

AnonymousdataIt’s already happened several times before, yet still another series of incidents has been released in which individuals connected to “anonymous” or “anonymized” data were ultimately identified by researchers .

This time, data scientists analyzed credit card transactions made by 1.1million people in thousands of stores over 90 days. The data set contained fields such as the date of the transaction, amount charged, and the name of the store. Personal details such as names, account numbers, etc. were removed, but the “uniqueness of people’s behavior” still made them identifiable. Just four random pieces of information was enough to re-identify 90% of shoppers in the database and attach them to other identity records. Researchers at MIT Media Lab, authors of the study, concluded that “the old model of anonymity does not seem to be the right model when we are talking about large scale metadata.”

“A data set’s lack of names, home addresses, phone numbers or other obvious identifiers,” they wrote, “does not make it anonymous nor safe to release to the public and to third parties.”

The full study was published in early 2015 in Science.

Read More

Visualizing Publicly Available US Government Data Online

Posted by on Sep 19, 2014 in Blog, Data Visualization | Comments Off on Visualizing Publicly Available US Government Data Online


Brightpoint Consulting recently released a small collection of interactive visualizations based on open, publicly available data from the US government. Characterized by a rather organic graphic design style and color palette, each visualization makes a socially and politically relevant dataset easily accessible.

The custom chore diagram titled Political Influence [] highlights the monetary contributions made by the top Political Action Committees (PAC) for the 2012 congressional election cycle, for the House of Representatives and the Senate.

The hierarchical browser 2013 Federal Budget [] reveals the major flows of spending in the US government, at the federal, state, and local level, such as the relationship of spending between education and defense.

The circular flow chart United States Trade Deficit [] shows the US Trade Deficit over the last 11 years by month. The United States sells goods to the countries at a the top, while vice versa, the countries at the bottom sell goods to the US. The dollar amount in the middle represents the cumulative deficit over this period of time.

Read More

Visits: Mapping the Places you Have Visited

Posted by on Sep 1, 2014 in Blog, Data Visualization | Comments Off on Visits: Mapping the Places you Have Visited

Visits [] automatically visualizes personal location histories, trips and travels by aggregating geotagged one’s Flickr collection with a Google Maps history. developed by Alice Thudt, Dominkus Baur and prof. Sheelagh Carpendale, the map runs locally in the browser, so no sensitive data is uploaded to external servers.

The timeline visualization goes beyond the classical pin representation, which tend to overlap and are relatively hard to read. Instead, the data is shown as ‘map-timelines’, a combination of maps with a timeline that convey location histories as sequences of maps: the bigger the map, the longer the stay. This way, the temporal sequence is clear, as the trip starts with the map on the left and continues towards the right.

A place slider allows the adjusting of the map granularity, reaching from street-level to country-level.

Read the academic research here [PDF]

Read More

Culturegraphy: the Cultural Influences and References between Movies

Posted by on Aug 22, 2014 in Blog, Data Visualization | Comments Off on Culturegraphy: the Cultural Influences and References between Movies


Culturegraphy [], developed by “Information Model Maker” Kim Albrecht reveals represent complex relationships of over 100 years of movie references.

Movies are shown as unique nodes, while their influences are depicted as directed edges. The color gradients from blue to red that originate in the 1980s denote the era of postmodern cinema, the era in which movies tend to adapt and combine references from other movies.

Although the visualizations look rather minimalistic at first sight, their interactive features are quite sophisticated and the resulting insights are naturally interesting. Therefore, do not miss out the explanatory movie below.

Via @albertocairo .

Read More

Beatquake: the Music Listening Activity across Facebook over 90 days

Posted by on May 28, 2014 in Blog, Data Visualization | Comments Off on Beatquake: the Music Listening Activity across Facebook over 90 days

Mapping Music on Facebook [] by Stamen Design for Facebook shows the dynamic characteristics of the typical listening activity across Facebook.

Inspired by the dynamic movement of a graphic equalizer, Beatquake maps the popularity of the top three most popular songs in the U.S., each day over the course of 90 days, by way of vertically moving particles.

Colored layers, each representing one song, rise and fall over geographic locations to correspond with the number of plays in that area. The texture of the map is driven by BPMs (beats per minute), and thus changes as one song overtakes another in popularity.


Read More