Pages Navigation Menu

Tracking the Congressional attention span

Posted by on Feb 20, 2007 in Blog, Data Visualization, Datamining, Emerging Science and Technology, TechnoActivism | Comments Off on Tracking the Congressional attention span

Arstechnica reports:  "While text mining 330,000 New York Times articles poses an interesting challenge, it’s not as interesting as sifting through 70 million words (from over 70,000 unique documents) found in the Congressional Record. A team of political science researchers  found that their software was able to answer questions too difficult for humans to handle on their own.

What’s exciting about this project and others like it is that computers are at last capable of unsupervised, dynamic analysis, and they can produce meaningful results with little or no intervention (humans will still be required to interpret the results, of course). The researchers in this project turned their software loose on 70 million words of Congressional debate without doing any initial topic coding. Researchers wanted to know several things: how do elected leaders distribute their attention? Under what circumstances do leaders push or follow public attention to an issue? Is debate on most issues incremental or explosive? Now that they could accurately track topics over time, the researchers found, for instance, that "judicial nominations" have consumed steadily more Congressional attention between 1997 and 2004. In fact, the topic produced the most number of words published in a single "day" of the Congressional Record: 230,000 on November 12, 2003.

Another hot issue, abortion, has moved in the other direction. Abortion has steadily received less Congressional attention over the last decade, and floor speeches on abortion now remain stable at one percent of the total (down from six percent in the 105th Congress)."

Read More

Price Protection and Pricing Transparency

Posted by on Jan 9, 2007 in Blog, Datamining, Media and Markets | Comments Off on Price Protection and Pricing Transparency

Many experts have written about the threat that total “price transparency” represents to traditional retailers,? and many retailers have seen their businesses decline due to the online consumer’s ability to easily compare prices.? Even in a store,? new barcode readers inside some celphones allow consumers to check online pricing across vendors and decide immediately if the in-store price is favorable or not.?? Now there is a new service that adds additional price transparency:? PriceProtectr. Among other things,? this service will notify you if prices drop on an item at any of dozens of retailers.? This could also be valuable for consumers where stores have instituted “price protection” within a certain period,? typically 30 days,? where if a price drops on a purchased item,? the buyer receives the difference.

Read More

Finding subversives by datamining Amazon’s “wishlists”

Posted by on Aug 20, 2006 in Blog, Data Visualization, Datamining, Technology and Privacy | Comments Off on Finding subversives by datamining Amazon’s “wishlists”

Tom Owad tells us  "It used to be you had to get a warrant to monitor a person or a group of people. Today, it is increasingly easy to monitor ideas. And then track them back to people. Most of us don’t have access to the databases, software, or computing power of the NSA, FBI, and other government agencies. But an individual with access to the internet can still develop a fairly sophisticated profile of hundreds of thousands of U.S. citizens using free and publicly available resources."

Tom proves the point by creating a script that pulls some 260,000 user wishlists from the Amazon database,  then datamines the lists for "subversive" topics and books.  The results,  and comments about the experiment on his blog,  are scary,  to say the least. 

Read More

AOL Apologizes for Release of User Search Data

Posted by on Aug 8, 2006 in Blog, Datamining, Technology and Privacy | Comments Off on AOL Apologizes for Release of User Search Data

AOL says it "screwed up" when it recently released search terms entered by 658,000 subscribers to researchers over a three month period, and has since retracted the data. Privacy advocates said that the search data could be linked to individual users, even though AOL replaced names and user ID’s of searchers with identification numbers. Examples of search terms that could be connected to individual users were strings such as "can you adopt after a suicide attempt" and "how to tell your family you are a victim of incest." Some search strings sought information on proper names or specific social security numbers.  It seems that users frequently requested information about local businesses that could reveal their location,  in concert with checking their own name,  credit card numbers, or social security data,  against search indices. 

Several blogs are pointing to mirror sites to let people look at the search logs of AOL users. See http://www.gregsadetsky.com/aol-data/ for one collection.

UPDATE on Aug. 9, 2006:  Interesting development reported by the New York Times today, "AOL Searcher No. 4417749 Is Exposed ."  An enterprising reporter was able to identify and track down one of the supposedly "anonymous" searchers whose search terms were released by AOL,  just by examining the pattern of her search queries.  Her name is Thelma Arnold and she lives in a small town in Georgia.  Arnold,  identified as Searcher No. 4417749 conducted hundreds of searches over a three-month period on topics ranging from “numb fingers” to “60 single men” to “dog that urinates on everything.”

According to the NY Times story, "search by search, click by click, the identity of AOL user No. 4417749 became easier to discern. There are queries for “landscapers in Lilburn, Ga,” several people with the last name Arnold and “homes sold in shadow lake subdivision gwinnett county georgia.”

“Those are my searches,” she said, after a reporter read part of the list to her.

Read More

Bush administration internet regulation proposal

Posted by on Jul 19, 2006 in Blog, Datamining, Technology and Privacy | Comments Off on Bush administration internet regulation proposal

Attorney General Alberto Gonzales has suggested a mandatory website "self-rating" system.   Naturally the contradiction in terms has not gone unnoticed.  

The system, very similar to one suggested under Clinton’s administration, would require by law all commercial websites to place ‘marks and notices’ on each page containing ‘sexually explicit’ content, with penalty up to 5 years imprisonment." From the article: "A second new crime would threaten with imprisonment Web site operators who mislead visitors about sex with deceptive ‘words or digital images’ in their source code–for instance, a site that might pop up in searches for Barbie dolls or Teletubbies but actually features sexually explicit photographs. A third new crime appears to require that commercial Web sites not post sexually explicit material on their home page if it can be seen ‘absent any further actions by the viewer.

Read More

Mashup Fever

Posted by on Jun 11, 2006 in Blog, Data Visualization, Datamining, Geolocation and Psychogeography | Comments Off on Mashup Fever

For months the announcements of new mashups have been coming fast and furious.  Earlier this year there were only a few hundred mashups… then a few thousand… and every day there are more.  It reminds me of 1994-1995 all over again, when new websites were being added to Netscape every day and announced in such places as "best of the net.."  Soon the sites tracking new mashups won’t be able to keep up.   But they are an amazing and quite useful (sometimes) phenomena.  My favorite from today’s roundup… "Real EstateFU"  which visualizes on a helpful map which locations in the San Francisco Bay Area are suffering from housing bubble implosion.

Read More