June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
I’m going to sidestep the “URL shorteners are bad because they obfuscate” discussion in this post. If you’re reading this, you likely have an opinion one way or another on that topic, but let’s leave that at the door. A bigger challenge is emerging as URL shortening continues to proliferate.
Web browsers unwinding a shortened URL when a user clicks on one is one thing, but when system software tries to unwind/resolve shortened URLs en masse, a problem emerges. The database that binds the short URL to its long version is hidden behind an API that can’t handle, or won’t allow (and I’m pointing at all of you URL shorteners out there), bulk unwinding of shortened URLs. The result is a bottleneck (the URL shortening services) that prevents “real-time” indexing of otherwise publicly available content. “Classic” offline crawl based search engines (e.g. Google, Y!, etc) will likely unwind in a latent “offline” manner, based on relevance. However, real-time search facilities are faced with trying to unwind large numbers of shortened URLs on the fly, and there doesn’t appear to be a way to accomplish this as the volume/rate of shortened URLs ever increases in daily social activity.
If your business relies on unwinding large volumes of shortened URLs in real-time, you’re faced with the usual optimization suspects: caching & relevance/prioritization based resolution. These will improve your ability to “keep up”, but they are a function of cache/hit ratios (which are generally poor in the social space when it comes to URL unwinding) and your own ability to decide what to unwind in an ever increasing volume of shortened URLs.
The result is another case of data control. If URL shortener & vanity host/URL adoption continues, and all URLs turn into redirects, we’ve become completely dependent on services that appear to be unwilling to open up their databases. I would appreciate part of this emerging standard including the ability to unwind in bulk.
We have some new “Powered by Gnip” badges available for our customers, partners and anyone looking to add something pretty to their website.
The badges are available on our website here: http://www.gnip.com/partners
We have three styles to chose from depending on your color preference, and here is the orange/white style.

The real-time internet is getting more attention and with that attention comes places for people to talk about what it all means. Since Gnip focuses on devliering data for the real-time web we end up being in on some part of the conversation and many solutions.
If you are planning or thinking about attending one of the upcoming conferences let us know as we will be attending.
1. Real-Time Stream CrunchUp, July 10th, Redwood City, CA
URL: http://www.techcrunch.com/real-time-stream-and-4th-annual-crunchup-at-august-capital/

2. Real-time Web 2009 Conference, July 29th, Mountain View, CA.
URL: http://rtw09.com/

This post is meant to provide a reminder and additional guidance for Gnip platform users as we transition to the new Twitter Streaming API at the end of the week. We have lots going and want to make sure companies and developers are keeping up with the moving parts.
Helpful information about the new Twitter Streaming API:
PS: The planned Facebook integration is coming along and we have our internal prototype completed. Driving toward the beta and should have more details in the next week or two.
PSS: We would still appreciate any feedback people can provide on their Twitter data intgration needs – take the survey
There has been some confusion around how to leverage Gnip’s Twitter Search (”twitter-search”) Publisher. We have work to do in order to clarify this use case from a usability/documentation standpoint, but in the meantime hopefully the following clarifies things a bit.
First off, “twitter-search” is a Polled Publisher which means it is subject to high latencies, as well as gaps in coverage. Secondly, we overload the “keyword” rule type in Filters in order to provide a mechanism for you to enter your http://search.twitter.com compatible queries (see http://search.twitter.com/operators for more information). Any query you can run on http://search.twitter.com, can be added to your Gnip filter as a “keyword” rule.
For example, if you search Twitter for “Boulder, CO” (including the quotes), Twitter considers that a literal, case-insensitive, phrase search; and so will Gnip. “Boulder, CO” (excluding the quotes), yields an OR search on Twitter; and hence the same in Gnip. If you search for “cars AND trucks” you get Boolean search operator behavior in Twitter, and subsequently in Gnip as well.
In short, we pass through the literal queries/strings that you hand Gnip, straight on through to Twitter. The “keywords” are opaque to Gnip. The only trick is in ensuring your “keywords” are entered into Gnip appropriately.
Through Gnip’s web interface, you can add comma separated keywords to a Filter. This is usually straightforward, however in the twitter-search Publisher case, it takes extra care to get the results you want, especially when you want to include commas or quotes in your queries. As a result, the format of the keywords entered in a twitter-search Publisher Filter must conform to csv quoting to ensure your queries get executed properly.
Through Gnip’s REST interface, you encapsulate the keywords within XML <rule> elements, so the csv quoting rules can be ignored.
For some further examples of how to add twitter-search keywords, see the Gnip API documentation.
As a final note, the overload of “keyword” rule types in Filters is something we’re experimenting with and is subject to change.
Last week we informed the community of our plans to transition to the new Twitter Streaming API. (see the blog post) This post is going to focus on providing some information on how Gnip Filters will be updated in order to support the new requirements of the Streaming API.
Here is a general summary of what Gnip users need to have in mind to prepare for the transition.
1) The Twitter Streaming API uses HTTP Basic Authentication to open up a connection. The authentication requires the Twitter Username:Password combination, and the account access tier is set at the Twitter account level.
2) The default Gnip support provided to users will be to the “spritzer” and “follow” tiers as these are public and can be accessed by any valid Twitter account.
3) Developers and companies that have use cases which require higher levels of access (gardenhose, shadow, birddog) need to send an email directly to Twitter at api@twitter.com. The email should include basic information about your use case, the access level that is required (gardenhose, shadow, birddog), and the Twitter account to map the access. Also, Twitter has a new URL to request access for the gardenhose level.
Also, to provide a preview of what the new Gnip filters will provide we wanted to include some screen shots of what we are working on at this time. (Also, you will notice the prototypes were built using an updated user experience we are working on for a future release)
Figure 1: Gnip Filter Creation
This is the start page for creating a Gnip filter that will connect to the new Twitter Streaming API. Users now will need to provide a valid Twitter account in order to support the HTTP Basic Authentication requirements of the API.
Figure 2: Gnip Filters will support the multiple tiers of the Twitter Streaming API
Twitter has multiple tiers for the Streaming API which will be supported in this update to the Gnip filters. In the developer web app or at the Gnip API it will be possible to select the Streaming API tier that the filter will access.
When we started Gnip last year Twitter was among the first group of companies that understood the data integration problems we were trying to solve for developers and companies. Because Gnip and Twitter were able to work together it has been possible to access and integrate data from Twitter by using the Gnip platform since last July using Gnip Notifications, and since last September using Gnip Data Activities.
All of this data access was the result of Gnip working with the Twitter XMPP “firehose” API to provide Twitter data access for users of both the Gnip Community and Standard edition product offerings. Recently Twitter announced a new Streaming API and began an alpha program to start making the new API available. Gnip has been testing the new Streaming API and now we are planning to move from the current XMPP API to the new Streaming API in the middle of June. This transition to the new Streaming API will mean some changes in the default behavior and ability to access Twitter data as described below
New Streaming API Transition Highlights
Twitter has several additional Streaming API methods available to approved parties that require a signed agreement to access. To better understand which developers and companies using the Gnip platform could benefit from these other Streaming API options we would encourage Gnip platform users to take this short 12 question survey: Gnip: Twitter Data Publisher Survey (URL: http://www.surveymonkey.com/s.aspx?sm=dQEkfMN15NyzWpu9sUgzhw_3d_3d)
What About the Gnip Twitter-search Data Publisher?
The Gnip Twitter-search Data Publisher is not impacted by the transition to the new Twitter Streaming API since it is implemented using the new Gnip Polling Service and provides keyword-based data integration to the search.twitter APIs.
We will provide more information when we lock down the actual day for the transition shortly. Please take the survey and as always please contact us directly at info@gnip.com or send me a direct email at shane@gnip.com
Obviously we have some understanding on the concepts of pushing and polling of data from service endpoints since we basically founded a company on the premise that the world needed a middleware push data service. Over the last year we have had a lot of success with the push model, but we also learned that for many reasons we also need to work with services via a polling approach. For this reason our latest v2.1 includes the Gnip Service Polling feature so that we can work with any service using push, poll or a mixed approach.
Now, the really great thing for users of the Gnip platform is that how Gnip collects data is mostly abstracted away. Every end user developer or company has the option to tell Gnip where to push data that you have set up filters or have a subscription. We also realize not everyone has an IT setup to handle push so we have always provided the option for HTTP GET support that lets people grab data from a Gnip generated URL for your filters.
One place where the way Gnip collects data can make a difference, at this time, for our users is the expected latency of data. Latency here refers to the time between the activity happening (i.e. Bob posted a photo, Susie made a comment, etc) and the time it hits the Gnip platform to be delivered to our awaiting users. Here are some basic expectation setting thoughts.
PUSH services: When we have push services the latency experience is usually under 60 seconds, but we know that this is not always the case sense sometimes the services can back-up during heavy usage and latency can spike to minutes or even hours. Still, when the services that push to us are running normal it is reasonable to expect 60 second latency or better and this is consistent for both the Community and Standard Edition of the Gnip platform.
POLLED services: When Gnip is using our polling service to collect data the latency can vary from service to service based on a few factors
a) How often we hit an endpoint (say 5 times per second)
b) How many rules we have to schedule for execution against the endpoint (say over 70 million on YouTube)
c) How often we execute a specific rule (i.e. every 10 minutes). Right now with the Community edition of the Gnip platform we are setting rule execution by default at 10 minute intervals and people need to have this in mind with their expectation for data flow from any given publisher.
Expectations for POLLING in the Community Edition: So I am sure some people who just read the above stopped and said “Why 10 minutes?” Well we chose to focus on “breadth of data ” as the initial use case for polling. Also, the 10 minute interval is for the Community edition (aka: the free version). We have the complete ability to turn the dial and use the smarts built into the polling service feature we can execute the right rules faster (i.e. every 60 seconds or faster for popular terms and every 10, 20, etc minutes or more for less popular ones). The key issue here is that for very prolific posting people or very common keyword rules (i.e. “obama”, “http”, “google”) there can be more posts that exist in the 10 minute default time-frame then we can collect in a single poll from the service endpoint.
For now the default expectation for our Community edition platform users should be a 10 minute execution interval for all rules when using any data publisher that is polled, which is consistent with the experience during our v2.1 Beta. If your project or company needs something a bit more snappy with the data publishers that are polled then contact us at info@gnip.com or contact me directly at shane@gnip.com as these use cases require the Standard Edition of the Gnip platform.
Current pushed services on the platform include: WordPress, Identi.ca, Intense Debate, Twitter, Seesmic, Digg, and Delicious
Current polled services on the platform include: Clipmarks, Dailymotion, deviantART, diigo, Flickr, Flixster, Fotolog, Friendfeed, Gamespot, Hulu, iLike, Multiply, Photobucket, Plurk, reddit, SlideShare, Smugmug, StumbleUpon, Tumblr, Vimeo, Webshots, Xanga, and YouTube
Jeremy Hinegardner has written a super cool utility (he calls it Snipe) in Ruby that uses Gnip Notifications to optimize your data collection needs. In a nutshell, it digests Gnip Notifications for the Twitter Publisher (though it could obviously be re-purposed for any Publisher) and pings Twitter to retrieve the tweets associated with said Notifications; rounding out Gnip <activity>s. Enjoy, and hats off to Jeremy; well done.
We are pleased to announce an early access program for a new Gnip data publisher to access and integrate data from the Facebook Platform Open Streams API.
Companies will realize immediate benefits from choosing to use the Gnip Platform for integrating data from Facebook.
Developers and companies can sign up right now to be notified when the early access program is launched by sending an email to info@gnip.com with the subject: Facebook. Any company signing up for the early access program will be eligible for three free months subscription service to the Gnip data publisher for the Facebook Platform once it is generally released. At this time the early access program is planned to be launched in the summer.
And to provide a small taste of the upcoming integration here are two examples of what common Newsfeed actions on Facebook will look like when accessed via the planned Gnip data publisher.
1) Status update Example (fbids in this example were changed from actual one in my stream item)
<activities publisher=”facebook”>
<activity>
<at>2009-05-16T14:07:25.000Z</at>
<action>post</action>
<activityID>http://www.facebook.com/profile.php?aid=6&id=12345&ref=at</activityID>
<actor metaURL=”http://www.facebook.com/people/Shane-Pearson/12345″>Shane Pearson</actor>
<destinationURL=http://www.facebook.com/profile.php?id=12345&story_fbid=12345</destinationURL>
<payload>
<body>It must be spring as my weekly trip to Lowes/Home Depot is back on the schedule</body>
</payload>
</activity>
2) Upload photo example (the below Gnip data schema maps to a Facebook activity stream example)
<activities publisher=”facebook”>
<activity>
<at>2009-04-06T21:23:00-07:00</at>
<action>upload</action>
<activityID>http://www.facebook.com/album.php?aid=6&id=499225643&ref=at</activityID>
<actor metaURL=”http://www.facebook.com/people/Snapshot-Smith/499225643″>Snapshot Smith</actor>
<destinationURLhttp://www.facebook.com/people/Snapshot-Smith/499225643</destinationURL>\
<payload>
<title>Snapshot Smith uploaded a photo.</title>
<body><p><a href=”http://www.facebook.com/photo.php?pid=28&id=499225643&ref=at” caption=”A very attractive wall, indeed”/></a></p>
</body>
<mediaURL type=”thumbnail” > http://photos-e.ak.fbcdn.net/photos-ak-snc1/v2692/195/117/499225643/s499225643_28_6861716.jpg</mediaURL>
<mediaURL type=”content” > http://www.facebook.com/photo.php?pid=28&id=499225643&ref=at<</mediaURL>
</payload>
</activity>