Geo Code http://geocode.hyperpublic.com Hyperpublic Local Data Engineering Blog posterous.com Wed, 21 Dec 2011 14:11:00 -0800 An Introduction to the Hyperpublic API, useful Python tools, and .... NYC Pizza http://geocode.hyperpublic.com/an-introduction-to-the-hyperpublic-api-useful-72301 http://geocode.hyperpublic.com/an-introduction-to-the-hyperpublic-api-useful-72301

The following post was written by our newest team member @deland, and it is something between an introduction to the Hyperpublic API and a documentation of a first experience with it. The actual use case described below is just a toy application - but it provided an opportunity for some cool visualizations. We hope you enjoy it and come up with some cool uses of your own!

If you haven't already, the first thing you're going to need to do to get started with the API and follow along (or experiment on your own) is to register for a Hyperpublic API key. It's free! It's fun! There are API wrappers for many programming languages so you can choose your favorite and get started right away. I'm going to be describing my method using Python, but the syntax in all the languages is pretty straightforward.

We're going to focus on pizzerias in New York City because everyone loves pizza (and because we had to choose something). The Hyperpublic API is a window into the world indexed by (exact) location. We can query the API by sending it addresses, zip codes, phone numbers, or latitudes and longitudes as well as categorical information (more on this in a moment). The query we send the API is then resolved to a specific latitude and longitude, and nearby places satisfying the query are returned. This means if we send the API a generic location like "New York, NY", a specific latitude and longitude is set, and then the search is executed. What this really means is you may not get what you wanted if you use such a generic location! The parameters available to query the API are described here, as is the structure of the documents that will be returned to you.

The categorical information in included in the query quite easily. We ask for places with a specific category (restaurant, office, etc) or we can ask for the database to be searched for other text (via the parameter 'q', for query). All locations with a place-name or tag matching your text will be returned. So we will query the API using the location parameter and also q = 'pizza'. For reasons that will become clear soon, I took the result of the queries and stored them in a local Mongo database . To be able to get all pizzarias in new york city, I looped over all the zipcodes in new york. The API only returns 50 locations at a time, but we can ask for more. Here is the relevant code:

 

from hyperpublic import *
...
for zip in zipcodes:
   inserted = 0
   notinserted = 0
   for p in range(1,rounds + 1):
      try:
         items = hp.places.find(location=str(zip),q="pizza", page = p, page_size = 50)
      except:
         break
      for item in items:
         try:
            dbh.nypizza.insert(item, safe = True)
            inserted += 1;
         except DuplicateKeyError:
            notinserted += 1
    print "%d items inserted and %d duplicates found for zipcode %d" % (inserted, notinserted, zip)

Here, my local mongodb connection is called dbh and the collection is called nypizza. Because some zip codes don't cover that much area - many of these queries will return the same pizzerias as other queries. This may not be a problem depending on your application, but you also might want to be aware of it (I just prevented mongo from duplicating records by inserting a key on the id field). Now we have the location of all (I get 2439) pizza places in the city - how easy was that?!

Probably we want to visualize all the locations at once. There are many options for doing this - and many existing python libraries to make our lives easier. For visualization only, my favorite tool (so far) was created by Seth Golub. It creates a heatmap based on density of points, and can overlay the 'heat' on top of maps coming from OpenStreetMap . This image is available under CC-BY-SA.

Pizza

Depending on your needs, you can also overlay this information onto Google Earth maps or Gmaps. The nicest tool I found for this is hosted here. The interface is incredibly easy. Here is the output (it spits out kml data which google earth can read, and can also be accessed by google maps):

Pizza_map
 Let's visualize the network of pizzerias in the city. I inserted a geo index on the mongo database. Then for every pizza place X, I can query the database for all other pizzerias within .1 miles (roughly) of X. Every time I find a relationship like this, I create an edge in a graph using the awesome network/graph software package at NetworkX. Here is the result:

Pizza_graph3

You might start wondering, "if I wanted to go on a 'pizza-slice' tour of new york city, and I wanted the longest sequence of pizzerias separated by .1 miles without visiting any place twice - what would I do?". Coincidentally, I wondered the exact same thing, and wrote a short function to search the graph for the longest such path. (For those who care, finding the longest path in a graph with cycles is known to be NP-hard. So don't try this on your big graphs.) Here is the result, you can visit 32 pizzerias (labeled, in order,it A-Z, then 0-5). Best of luck.

Pizza_tour

Permalink | Leave a comment  »

]]>
http://posterous.com/images/profile/missing-user-75.png http://posterous.com/users/eiXolTgEhEGp4 mdeland mdeland mdeland
Mon, 05 Dec 2011 10:04:00 -0800 We're looking for a super sweet Developer Evangelist http://geocode.hyperpublic.com/were-looking-for-a-super-sweet-developer-evan http://geocode.hyperpublic.com/were-looking-for-a-super-sweet-developer-evan

We're looking for an outgoing, technically minded person to represent Hyperpublic in the developer community. You love technology and people. You’ll craft and execute an outreach strategy to get the word out about our platform with developers and companies. Dream up any creative idea and make it happen. You’ll get to work with some of the smartest hackers, designers, and entrepreneurs in the technology startup community.

Responsibilities
  • Advocate for Hyperpublic at meet ups, conferences, hackathons as well as on forums, email, social media etc…
  • Brainstorm great location aware applications and services with developers
  • Write blog posts about apps and web services using location data
  • Listen to the community's needs and ensure that our engineering and product teams are building the right thing
  • Create fun tutorials and example hacks to show off our API
  • Do whatever it takes to make Hyperpublic top of developers’ minds
Requirements
  • Up to speed with current web and mobileapplications
  • Prior experience with community management, developer relations, or software engineering
  • Excellent written, oral, and presentation skills
  • Passionate about the tech industry
  • Leadership experience in a team setting, through hosting events, or building an online community
  • Degree in one or both ends of the creative/technical spectrum: Computer Science (or something similarly technical) or a major focused on writing (such as English Literature)
Preferred
  • Can quickly build a lightweight proof of concept using the Hyperpublic API
  • Prior startup experience, preferably in New York
  • Understand RESTful APIs and the general web stack
  • Programming skills in a modern language like Ruby, Python, and/or Javascript
Interested or can make an introduction? Let us know at jobs@hyperpublic.com. We're also hiring for product designer, data engineers, and front end magicians.

Img_0281

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1502949/jeff-weinstein-300x300.jpeg http://posterous.com/users/4afJK7rGgypb Jeff Weinstein jweinstein Jeff Weinstein
Sat, 05 Nov 2011 07:25:00 -0700 Reinvent Local @ General Assembly This Weekend http://geocode.hyperpublic.com/reinvent-local-general-assembly-this-weekend http://geocode.hyperpublic.com/reinvent-local-general-assembly-this-weekend

This weekend General Assembly is hosting the Reinvent Local Hack Day. Hyperpublic, and companies like American Express Open Forum, Yipit, NYC.gov, ordr.in, and more will be presenting tools and services that they've built aimed at helping local businesses connect with their communities. 

If you have any good ideas for applications that you'd like to see built that help you interact with the local businesses around you then swing on by General Assembly, or tweet using the #reinventlocal hashtag, and maybe your idea will come to life this weekend.

 

Screen_shot_2011-11-05_at_10

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/390132/dougfacebooksquare.jpg http://posterous.com/users/3FrYNYvW Doug Petkanics the dob Doug Petkanics
Fri, 04 Nov 2011 16:22:00 -0700 Ring ring, ring ring. It's you calling Places+ http://geocode.hyperpublic.com/ring-ring-ring-ring-its-places-calling http://geocode.hyperpublic.com/ring-ring-ring-ring-its-places-calling

Want to find a place by phone number? Now it’s easy using the Hyperpublic platform. As a developer, you can easily search our Places+ product by phone number.

We built this based on developer requests so please let us know what to build next. Who should you call? Well, whichever company is at 212 229 2217 of course:

You can also query using formatted phone numbers (i.e. with a leading +1, trailing extension, various non-digit characters, etc...) using the same phone_number parameter. Here's the same query using +1 (212) 229 - 2217:

Permalink | Leave a comment  »

]]>
http://posterous.com/images/profile/missing-user-75.png http://posterous.com/users/hgVVFDiIJJwDM mkscrg mkscrg mkscrg
Wed, 02 Nov 2011 14:45:00 -0700 APIs For Beginners @Hyperpublic http://geocode.hyperpublic.com/apis-for-beginners-hyperpublic http://geocode.hyperpublic.com/apis-for-beginners-hyperpublic

On Thursday, November 3rd, Hyperpublic's head of engineering, Doug Petkanics, will be teaching a Skillshare class called "APIs For Beginners." The class will start with the basics, and will cover the following:

  • What is an API?
  • What type of data and services do companies make available via APIs?
  • Why would I use APIs?
  • How do I use APIs?
There is no prior programming experience required, but as APIs are quite literally "programming interfaces", you will have to do some light programming in order to make use of the data you get through the APIs. 

We'll talk about APIs from companies like Flickr, Paypal, Google, and Twitter. You'll learn how to get data from them and what you can do with their services. If you're interested in getting started with building web applications, then this class will provide a good introduction to the types of data and services you can leverage to speed up your development and enhance your applications.

Limited tickets are still available so sign up today. All proceeds from the ticket sales will be donated to HackNY.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/390132/dougfacebooksquare.jpg http://posterous.com/users/3FrYNYvW Doug Petkanics the dob Doug Petkanics
Thu, 27 Oct 2011 10:56:00 -0700 Haskell + Hyperpublic = <3 http://geocode.hyperpublic.com/haskell-hyperpublic-3 http://geocode.hyperpublic.com/haskell-hyperpublic-3

Do you love pure functional programming, strong typing, *and* Hyperpublic's API? Well you're in luck: there is now an official API library for the Haskell language. Check it out on GitHub (https://github.com/mkscrg/hyperpublic-haskell) or Hackage (http://hackage.haskell.org/package/hyperpublic). The docs on Hackage and the source distribution include some example code, and usage questions can be posted to this list.

Let us know what you think!

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1502949/jeff-weinstein-300x300.jpeg http://posterous.com/users/4afJK7rGgypb Jeff Weinstein jweinstein Jeff Weinstein
Mon, 03 Oct 2011 16:09:06 -0700 HackNY Hacks Built on Hyperpublic http://geocode.hyperpublic.com/hackny-hacks-built-on-hyperpublic http://geocode.hyperpublic.com/hackny-hacks-built-on-hyperpublic This weekend computer science students from schools throughout the entire Northeast descended upon NYU to participate in the HackNY Fall 2011 Hackathon. After 24 hours of coding, students presented what they built. Many of the students used Hyperpublic's Places+ and Geo Deals & Events products to power their hacks. Some of our favorites are listed below.

AdRunner (Winner of 2nd Place Prize)

YPNHOI (Winner of "Most Schools" Prize)

YPNHOI stands for You've Probably Never Heard Of It. This app aims to surface places that aren't being talked about on twitter, that people aren't checking into on Foursquare, and that aren't being reviewed on Yelp. If you've heard of it, it ain't cool.

CheapChap

Screen_shot_2011-10-03_at_6

CheapChap helps plan your date in a cost effective way, by recommending cheap gifts, restaurants, and hotels.

Smilify.me

Screen_shot_2011-10-03_at_7

Smilify.me takes a photo of you every 5 minutes, and it charts your mood over time. If it determines that you're in a bad mood, it performs an intervention and shows you a picture of a lolcat or suggests that you go to take a walk in a nearby park (discovered through Hyperpublic of course). 

There were many great apps built by some incredibly talented students. We look forward to presenting and participating in many future hackathons.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/390132/dougfacebooksquare.jpg http://posterous.com/users/3FrYNYvW Doug Petkanics the dob Doug Petkanics
Sat, 01 Oct 2011 11:36:00 -0700 Fall 2011 HackNY Hackathon http://geocode.hyperpublic.com/fall-2011-hackny-hackathon http://geocode.hyperpublic.com/fall-2011-hackny-hackathon
Photo-3

Hyperpublic is proud to be presenting today at the Fall 2011 HackNY Hackathon. HackNY is a non-profit organization in NYC who's aim is to "federate the next generation of hackers for the New York innovation community". Twice per academic year, once in the fall and once in the spring, they bring together undergrad and graduate computer science students from all over the Northeast for a 24-hour hackathon hosted at NYU. 

This year Hyperpublic sponsored a team of 6 students from CMU by transporting them from Pittsburgh and setting them up with lodging, pizza, and caffeine - the staples of any successful weekend hack. We'll be presenting our places database and our deals and events api to the 300+ students who have showed up for the event, and hopefully some cool hacks get built using Hyperpublic data. 

For anyone in attendance, we'll be around for a good portion of the 24-hour hacking period, and we're always reachable at @hyperpublic or at contact@hyperpublic.com.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/390132/dougfacebooksquare.jpg http://posterous.com/users/3FrYNYvW Doug Petkanics the dob Doug Petkanics
Fri, 30 Sep 2011 06:44:00 -0700 Hyperpublic Brings CMU to New York http://geocode.hyperpublic.com/hyperpublic-brings-cmu-to-new-york http://geocode.hyperpublic.com/hyperpublic-brings-cmu-to-new-york

Two of Hyperpublic's engineers, @ericxtang and @zanecstarr, as well as a couple of our investors and advisors are alums of Carnegie Mellon's computer science program, so we're eager to support CMU students when they're interested in learning about the NYC startup scene. This weekend, about 6 CMU undergrads were looking to make the trip from Pittsburgh to New York in order to participate in a weekend hack event with other local college students. We were going to sponsor their bus tickets, but Eric thought it'd be more fun to get out to Pittsburgh and road trip with the gang. Details of their journey to follow today...

 

6:30am - Rise and Shine

Photo

 

7:30am - Goodbye New York (for a few hours)

Screen_shot_2011-09-30_at_9

 

9:55am - Hello Pittsburgh

If you see this van on the road, watch out for serious hacking going on inside. In the words of Eric (and Jay-Z), "No chrome on my wheels, but I'm a balla for real!" 

 

11:55am - Arrival at CMU

Photo-2

 

2:55pm - Nap time in the van back to NYC

 

6:20pm - NYC Pulls into view

Photo-5

 

8:20pm - Donatello's Pizza after a long trip. Totally hits the spot!

Photo_5

 

11:30pm - After a long day of traveling, eating and hanging out, the team retires to the AirBnb apartment in downtown Manhattan.  Rest up for a weekend of hacking!

Airbnb

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/390132/dougfacebooksquare.jpg http://posterous.com/users/3FrYNYvW Doug Petkanics the dob Doug Petkanics
Thu, 29 Sep 2011 07:13:00 -0700 Geo Deals and Events FAQ http://geocode.hyperpublic.com/geo-deals-and-events-faq http://geocode.hyperpublic.com/geo-deals-and-events-faq

This FAQ answers many of the commonly asked questions about our Geo Deals and Events product. If you have a question that isn't answered here, feel free to email us at affiliates@hyperpublic.com.

What type of data does a call to the deals and events api provide?
This /offers api endpoint returns local daily deals, and local events like concerts and meetups. Let's take a look at one deal example element returned by the API...

Most of the fields are pretty self explanatory. The price field is the cost in dollars of the deal, and the value field is what the deal is presumably worth. The payout field is the amount you will be paid upon each click that your app generates to the url field. The image field is useful for your UI, and the place field gives you location information to plot the deal on a map, or link to a Hyperpublic Places+ place for more information.

How does it work?
After integrating the Geodeals & Events API into your application, Hyperpublic will pay for each click to an offer’s URL. Currently, there is a flat rate per click, but performance-based rates are coming soon.
 
How much do I get paid?
Every deal has a payout field which lists how much a click to that deal is worth. For each click your app generates to the url given in the url field, you will earn the amount given in the payout field.
 
How do I get paid?
You will get paid via check within 45 days of the end of the month if you have earned more than $50. If you require payment early, send us an email to affiliates@hyperpublic.com. When you qualify for payment, Hyperpublic will be in touch to determine where to send your check, so be sure your contact info is correct in your application registration page.
 
When do I get paid?
You’ll be paid by check within 45 days of the end of the month if you’ve earned more than $50. If you require early payment, send an email to affiliates@hyperpublic.com.
 
Can I see how much traffic I’m generating?
Your Application Management page includes a “Click Data” link for each of your applications. Currently, this provides a tabular view of click volume over various time periods and an option to export the data to CSV format. Improvements to this interface are coming soon.
 
Why are the Offers’ URLs so long?
Hyperpublic uses long links to track the traffic your application generates. Each link is unique to the application and the particular offer. Users are redirected to the offer vendor’s page. If you would like to serve shorter links, please use a URL shortener like Bit.ly or Goo.gl.

Some deals and events have a 0.00 payout. What gives?
Many events are free, or do not pass on affiliate fees. This data is still valuable to developers building certain classes of applications so we still include it. If you would like to get paid for every click you generate, only show deals with non-zero payouts. Our developer docs show how to request only deals with non-zero prices.

I am a deals provider. How do I include my deals in Hyperpublic?
Just email us at affiliates@hyperpublic.com. If you have an API or a daily feed containing all the information about the deals you offer, then we're happy to include your data in our product as soon as possible.

How do I get started?
Our developer docs explain how to use our deals product. First you register for an API key, and then you can begin making calls immediately through your web browser, a command line client like CURL, or through one of our API libraries for the language of your choice.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/390132/dougfacebooksquare.jpg http://posterous.com/users/3FrYNYvW Doug Petkanics the dob Doug Petkanics
Wed, 28 Sep 2011 11:32:00 -0700 Introducing the Hyperpublic API http://geocode.hyperpublic.com/introducing-the-hyperpublic-api http://geocode.hyperpublic.com/introducing-the-hyperpublic-api

The last few months have been busy at Hyperpublic and we’d love to tell you about two great local data APIs: Places+ and Geo Deals and Events.

Places+ returns nearby businesses and other points of interest (POI). Our places data set has freshest and most accurate information from the web. You can query the API from many popular languages and use it in your web and mobile applications.

Geo Deals and Events is a collection of real time local offers including daily deals, meet ups, exhibitions and other types of events that can be integrated into any app or web service. When your users interact with these offers, you get paid through our local offers monetization plan.

For this post, we'll focus on calling the API through simple URLs that you can try in your browser. We’ll walk through what the requests and results look like and how they're constructed.

The Places+ API returns points of interest around a specific location. Here are places around Hyperpublic HQ:

Click and you'll get a block of raw JSON. 

That's an array of JSON objects: each one represents a place in the Places+ database. Each place includes useful information like its name, lists of tags and properties, and its locations with street address and latitude/longitude.

In most cases, API calls are simply GET requests to (carefully constructed) URLs. Let’s review each piece of the example above:

  • https://api.hyperpublic.com/api/v1 — This is the base URL for Hyperpublic API requests. Hyperpublic requires SSL for all requests.
  • /places — Include this path after the base URL to access the Places+ endpoint. Other endpoints use different paths.
  • ?...&...&... — These are query parameters. Following the ? is an &-separated list of key=value pairs. You use these to define your search and pass your authorization credentials:
    • address=416%20W%2013th%20St%2C%20NY%2C%20NY%2C%2010014 — The address parameter takes a URL-encoded address. For this example, we URL encoded "416 W 13th St., New York, NY, 10014".
    • client_id=..., client_secret=... — These are the Client ID and Client Secret values you're given when you register an application. They must be included in every API call. (The values used in this post are just for documentation and examples. Don't use them in your own applications!)

By default, all location queries return at most 10 results inside a 2 km radius. You can override the defaults by passing the relevant query parameters. To get more results, include limit=50, or to search within a 0.5 km radius, include radius=0.5. How about the closest Japanese restaurant to Madison Square Garden? Combine category=japanese with the lat, lon, and limit parameters, and you're there:

You can get a specific place by making requests to /places/ID, where ID is the value of the id field in that place’s JSON representation. Here's how to get that same sushi restaurant without searching:

Calling the Geo Deals and Events endpoint is as easy as substituting /offers for /places in the request URL. The basic search and show functionalities are the same as with Places+, but some of the query parameters are different. Let's find offers near Prospect Park, Brooklyn, that cost less than $50:

And we can request a specific offer the same way we requested a specific place:

This is a basic introduction to the Hyperpublic API; check out the API documentation for a full reference. The documentation pages will be updated with new features as we roll them out, so check back often! Our Geo Code blog will have more posts to show how to use language-specific libraries to easily integrate Hyperpublic into your applications.

Got questions? Post to our API Developers Google Group and we'll be glad to help you out. Happy hacking!

Permalink | Leave a comment  »

]]>
http://posterous.com/images/profile/missing-user-75.png http://posterous.com/users/hgVVFDiIJJwDM mkscrg mkscrg mkscrg
Tue, 27 Sep 2011 08:02:45 -0700 Stanford Machine Learning Course Open To The Public Online http://geocode.hyperpublic.com/stanford-machine-learning-course-open-to-the http://geocode.hyperpublic.com/stanford-machine-learning-course-open-to-the Consider this a public service announcement relevant to all you aspiring data hackers out there: Stanford is offering their introductory undergraduate machine learning course to the public available online for free. The course will be taught by Professor Andrew Ng, and will consist of 2-3 weekly lectures delivered via online video, review questions, and programming exercises. Students will be able to submit questions and will be graded on their review question and programming exercise submissions. 

The material, while not trivial, requires no formal pre-requisites except for experience with at least one programming language, and probably some comfort with linear algebra. The web site says to expect to spend approximately 10 hours per week during the 10 week curriculum. To sign up, visit the course web site at http://www.ml-class.org.

At Hyperpublic we use machine learning in a number of places while building our places dataset. The most direct application is in our classification algorithms to determine whether a particular business might be a Japanese Restaurant, Female Shoe Store, or any of the other 300+ categories that we've defined in our ontology. We can also apply ML to solving data freshness problems and cross-referencing problems.

While we have experienced data gurus with academic ML training on the team, the rest of us academically curious engineers are looking forward to participating in the Stanford course. We'll be occasionally hosting study groups at the office and pestering one another to make sure everyone hands in their homework on time. Back to school!

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/390132/dougfacebooksquare.jpg http://posterous.com/users/3FrYNYvW Doug Petkanics the dob Doug Petkanics
Mon, 26 Sep 2011 04:59:00 -0700 What Do Developers Want From Local Data? http://geocode.hyperpublic.com/what-do-developers-want-from-local-data http://geocode.hyperpublic.com/what-do-developers-want-from-local-data

We've spent a lot of time over the past year talking with developers about what they are looking for in terms of local data. Everybody has different requests depending upon what they're building, however certain requests come up time and time again.

A Complete and Up-To-Date Places Database
This is priority number one. Everybody wants to know what restaurants, shops, bars, landmarks, and points of interest exist at what addresses and lat/lon's. Whether your'e building a restaurant recommendation app, or an app that shows users where to find the nearest coffee shop, you need a list of all the local businesses to build off of.

Rich Data
Having access to a POI database containing business names, phone numbers, and addresses is a start, but when you start to dig a little deeper into developer's wishes, you find that they generally want more. If we dig into the restaurant review app for example, we'll quickly find out that the developers really want to know what types of food the restaurants serve, what their hours are, photos of the exterior and interior of the restaurant, pricing information, and potentially even the items on the menu.

Offers and Deals
New York recently asked their citizens what apps they'd like to see built by developers during the NYC Big Apps competition - a contest in which developers submit local applications built on NYC public data sources. A fair number of popular suggestions asked for a way to find out what deals and offers are going on around me right now. These deals save money for consumers and allow them to try new things cheaply, and as such, they are a very effective local advertisement for businesses. Developers have taken note, and want an easy way to surface deal information to their users, and possibly make money in the process.

Useful Social Location Data
Who's tweeting, checking in, attending events, and posting photos about a particular location? There's a lot of noise created in the social channels, but there's also a lot of golden original content that can be sliced, diced, and analyzed for insights. Developers want access to this data, potentially in a filtered way to eliminate the noise.

Inventory Data
It is useful to know what is being sold, where, and at what price. It's hard to build efficient recommendations, or comparison shopping apps without access to this data, and amassing it across sources is tough.

There are plenty of other examples that developers have requested in the long tail, but the above represents the requests that we hear on a repeated basis. Hyperpublic has been working to make this data available to developers in an open manner. We have a lot of work to do going forward, but look for announcements and future posts soon detailing how exactly to power your local apps using the above data from Hyperpublic.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/390132/dougfacebooksquare.jpg http://posterous.com/users/3FrYNYvW Doug Petkanics the dob Doug Petkanics
Tue, 02 Aug 2011 07:52:00 -0700 Developer Tech Talk - Monads http://geocode.hyperpublic.com/developer-tech-talk-monads http://geocode.hyperpublic.com/developer-tech-talk-monads

Screen_shot_2011-08-02_at_10
Is there anything more confusing to the non-Haskell coder than Monads? On Wednesday, August 3rd at 12:30pm, Hyperpublic's very own functional programming expert, @mkscrg will be presenting the talk which aims to answer the question, "Are Monads the best thing since sliced bread?"

No spoilers here, you'll have to come to the talk to hear the answer. In the word's of Mike...

Monads are one the most talked about "features" of the Haskell language. The web abounds with monad tutorials and monad examples, yet they maintain a reputation for giving newcomers a hard time. What are monads, and what are they good for? Do you even have to code in Haskell to use them? We'll approach monads from a problem-oriented perspective, with discussions of some other important type classes along the way.

If you want to participate in this week's developer lunch, shoot an email to doug@hyperpublic.com to let us know you're coming, and bring lunch to our office at 12:30 on Wednesday, August 3rd. 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/390132/dougfacebooksquare.jpg http://posterous.com/users/3FrYNYvW Doug Petkanics the dob Doug Petkanics
Tue, 05 Jul 2011 07:54:48 -0700 Wednesday Developer Lunch http://geocode.hyperpublic.com/wednesday-developer-lunch http://geocode.hyperpublic.com/wednesday-developer-lunch
Solr

At the Hyperpublic office we try to have a "developer lunch" every Wednesday around 12:30. Each week someone from the engineering team presents on a topic of interest or technology that they think the team will find interesting. We generally believe that the more hackers hanging around the office the better, and as such we're happy to host outside developers who are interested in coming by and learning about the topic at hand.

This week @ericxtang will present on the popular open source search server, Apache Solr. He'll focus not only on the out of the box features, but as usual with any Hyperpublic tech talk, he'll focus on geo-location specific features and how you may have to modify client libraries in order to get support for the types of location queries you need. 

If you're interested in joining just email doug@hyperpublic.com to RSVP, and bring your own lunch at 12:30 to our office.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/390132/dougfacebooksquare.jpg http://posterous.com/users/3FrYNYvW Doug Petkanics the dob Doug Petkanics
Tue, 31 May 2011 14:42:00 -0700 Migrating From PostgreSQL to MongoDB at Hyperpublic http://geocode.hyperpublic.com/migrating-from-postgresql-to-mongodb-at-hyper http://geocode.hyperpublic.com/migrating-from-postgresql-to-mongodb-at-hyper

The engineering team at Hyperpublic has been hard at work over the last few weeks re-architecting our platform in order to migrate from a relational database (PostgreSQL) to a NoSQL datastore (MongoDB). We completed the migration about two weeks ago, and nothing major has broken down so far, so it's about time to do a recap so that the community can benefit from what we learned along the way. This post starts off anecdotal and moves to technical, and hopefully after reading it you'll have a good background on the reasons why you may want/need to migrate and how to go about performing the migration.

Where we came from
The Hyperpublic platform was originally built using Ruby on Rails on the Heroku platform in order to speed development time and iterate quickly. Until we discovered our true utility as an open rich location data platform, there was no sense in over engineering a custom system for performance and reliability. As a result, our choice of database was made for us by Heroku, which only supports PostgreSQL out of the box. This was fine, as most local objects were being added to our system by our users, and we were only adding objects within a couple major US cities in order to prove the concept and utility of our platform.

For those of you not familiar with Hyperpublic, what we do is provide a rich data layer on top of local objects. For every real world person, place, or thing we want to be able to provide developers with the object's physical location, tags that describe it, photos, descriptions, and various properties that will be useful to anyone building an application that could use local data. Since the data was modeled relationally, you can probably make some accurate guesses about the database tables that we defined:

  • People
  • Places
  • Things
and each of the above have many different...
  • Locations
  • Images
  • Tags
  • Properties

...among others.

So what was the problem?
When Hyperpublic began to grow we began building up the data programmatically. The number of local objects that we had in our database increased and we found our niche as a data provider, we began to run into our first two problems of scale.

As you can imagine, in order to return a local object to a user making an API call or viewing our application at Hyperpublic.com, we would have to join on all of the above tables. This was slow. Also, it felt illogical that we would constantly have to join across a normalized data structure to receive images, tags, locations, and properties for a given place, when those images, tags, locations, and properties only belonged to one specific place every time. 

The second problem that we faced was support for geo-spatial queries. Without using a geo-extension for PostgreSQL called PostGIS, in order to do proximity, bounding-box/radius, and nearest-neighbor queries you have to do math on the stored lat/lon for every point in your system. This means a table scan, and when you get beyond one or two cities worth of local object data in Hyperpublic, this gets very slow. We began researching and educating ourselves on PostGIS. It is undoubtedly a reasonable option to solve the types of problems we were facing, but the implementation felt less than clean. It felt ugly to program against, and it felt tacked onto Postgres instead of embedded within it from the beginning.

Enter MongoDB
While evaluating solutions for the above problems we were looking for a database that could store the arbitrary properties and undefined quantities of metadata along with each object. This is the prototypical usecase for a NoSQL datastore. Additionally, we were looking for geo-spatial index and query support: one of the oft-focused-upon features of MongoDB. We knew that the 10gen team was here in NYC as we've participated in many events and conferences in which they've been present, and they're very supportive of startups building on their technology. The choice to migrate to MongoDB was obvious being that they support all of the near term features that we require from a datastore. Then it was just a matter of how we would do the migration...

How we migrated
(Note - this section is somewhat Rails/ActiveRecord/Mongoid heavy, but you can replace these terms with your frameworks and ORM/ODM of your choice).

Step 0 - Have a unit test suite with good coverage on your models and make sure all your tests pass.

Step 1 - Write scripts to copy your data into MongoDB. Do not delete/change anything in the current schema.

In a non-optimized Rails application, the closest that you get to your database is by writing your model classes using the ActiveRecord object-relational mapper. When switching to MongoDB we chose to use the Mongoid object-document mapper as a non-quite-dropin replacement for ActiveRecord.Our migration script first namespaced the Mongoid objects, defined the collections they would be stored in, and defined the fields that would be mapped in each collection. It looked something like this...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
module MongoHP
  class HPObject
    include Mongoid::Document
    include Mongoid::Timestamps
    store_in :hp_objects
    
    field :object_type, :type => String
    field :tags, :type => Array
    ...

    # Embeds
    embeds_many :images, :as => :photo_owner, :class_name => "MongoHP::Image"
    embeds_many :locations, :class_name => "MongoHP::Location"
    
    ...
  end
 
  class Location
    include Mongoid::Document
    include Mongoid::Timestamps
    embedded_in :hp_object, :class_name => "MongoHP::HPObject"
  end

  class Image
    ...
  end
end

The goal of the data migration scripts is to instantiate your ActiveRecord objects and then insert them into Mongo using Mongoid objects. (You can bypass Mongoid and go straight to MongoDB using the ruby driver, however Mongoid gives you some conveniences like relationships and timestamps). We opted to always instantiate the objects, however we would do so using a queue and background jobs so that we wouldn't exceed the memory on the server by instantiating every single object in one process. Our script would basically just map the data from AR to Mongoid like so...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
  class CopyPlaceJob
    @queue = :general

    def self.perform(id)
      old_place = ::Place.find(id)
      mongo_user = MongoHP::User.first(:conditions => {:old_id => old_place.user_id})
      if mongo_user
        place = mongo_user.hp_objects.create(:old_id => old_place.id,
                                             :object_type => "Place",
                                             :display_name => old_place.name,
                                             :description => old_place.description,
                                             :phone_number => old_place.phone_number,
                                             :website => old_place.website,
                                             ...)
        MigrateToMongo.migrate_object_tags(place, old_place)
        MigrateToMongo.migrate_object_images(place, old_place)
        MigrateToMongo.migrate_object_locations(place, old_place)
      end
    end
  end

Notice that we are copying the old_id from AR to Mongoid. This has proven useful many times over when needing to do lookups after the fact, and I recommend you keep it around for awhile until you are certain that you'll never need it again. Regarding ID's, keep in mind that your object ID's will change. You'll need to update any external resources that refer to the old object ID's. For example, our Amazon S3 was configured to store photos in buckets named after the object's ID.

After your script is written, you should be able to safely run it on a copy of your production dataset as a test, since it won't modify or delete any production data.

Step 2 - Update your application
At this point, you'll want to create another branch to update your application. The reason is because the migration needs to reference the ActiveRecord models in the current application, so you can't delete or modify them until all the data is copied over. On a separate branch, you can update your models to use Mongoid instead of ActiveRecord. The meat of this process can be copied over from the models you created during your data migration. When you do eventually get your application working again with the updated models, update and run your unit tests to make sure that they all pass.

Step 3 - Configure your production MongoDB environment
We won't go into details here, but we recommend at least a 3 node replica set configuration. MongoDB has a good writeup on how to set this up here.

Step 4 - Deploy to production
Backup your database and take your application down for maintenance so that no writes come in after the data migration. Deploy the branch with your data migration and run it. Assuming all goes well, deploy the second branch with your updated application. Restart your application and you'll be up and running on MongoDB.

The results
I don't have hard numbers so this is going to be more anecdotal than scientific, but after migrating to MongoDB the Hyperpublic platform was immediately faster and more scalable than it was previously. We've seen over 5x speed increases in the user facing application and 20x speed increases within our API. 

Geospatial queries now use the indexes computed ahead of time and return instantly instead of doing table scans and distance computing math on each query. 

Loading all of the metadata associated to a local object is now completed without any joins, and the number of queries per page was reduced dramatically.  

We went from 2 cities worth of data pushed into the production system as a proof of concept, to 10+ cities worth of data pushed in and usable by third party developers off of our live platform with no performance bottlenecks in site in the near term.

Gotchyas and lessons learned
10gen is iterating very quickly on MongoDB, and as a result programming against it is sometimes a moving target. Here are some of the lessons learned along the way during the migration:

  • MongoDB has great support for geo-spatial indexes, but if you want multiple locations indexed per document, you'll have to use MongoDB 1.9. Our "People" can have multiple locations - where they live, where they work, etc - so this was a requirement for us. 1.9 is supposedly unstable and not recommended for production use, but we have been quite happy with it.
  • The ODM's likely won't be fully featured, up-to-date, or drop-in replacements for the ORM you may have been using with PostgreSQL. If you're a beginner, I recommend getting very familiar with the MongoDB driver for the language of your choice, as you'll frequently have to drop down to it directly.
  • Do not try to model things relationally in MongoDB. If your problem is suited to relational modeling, then stick with a relational DB. 
  • Keep old postgres ID's around. You frequently have to refer to them when doing in memory "joins" with your old relational data or when referring to legacy data stored in external services.
  • The MongoDB community and 10gen are very helpful. Talk to people at local meetups and conferences, and they're usually happy to help you with any issues you have in your migration to MongoDB.

I hope this post was useful. If you're migrating to MongoDB and have any questions or could use some help, drop me a line anytime @petkanics on twitter.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/390132/dougfacebooksquare.jpg http://posterous.com/users/3FrYNYvW Doug Petkanics the dob Doug Petkanics
Mon, 16 May 2011 16:06:00 -0700 Look for Hyperpublic at Hackdisrupt http://geocode.hyperpublic.com/look-for-hyperpublic-at-hackdisrupt http://geocode.hyperpublic.com/look-for-hyperpublic-at-hackdisrupt

What's up people? This weekend, May 21-22, the Hyperpublic engineering team will be hanging out at the Techcrunch Disrupt Hackathon in New York City at Pier 94. We'll present our developer platform on Saturday afternoon to anyone who's interested in learning how to build an application which leverages rich local data.

We've been looking forward to this event for months. There will be 500 hackers who will form teams and build projects over a 24 hour period, competing for prizes, glory, and the honor that comes with completing 24 hours of coding fueled exclusively by pizza and energy drinks. If you see @ericxtang, ask him to brew you up a Red Monster.

All the details regarding how to participate and signup in the event can be found here. What are you planning to build?


View Larger Map

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/390132/dougfacebooksquare.jpg http://posterous.com/users/3FrYNYvW Doug Petkanics the dob Doug Petkanics