Again, the programming language used in this project is Python, with the aid of MongoDB to analyze the map dataset. Students should be able to demonstrate the capability to assess the quality of data, and update the data if possible.
Besides some programming techniques, one interesting thing I have learnt from this project is the vision of OSM. While I always regarded Google Map as a very open platform, frankly I have never thought about the copyright. In fact, Google decides what can be shown in the map. For example, if you write an Android application and use the Google Map API for Android, you can display a map fragment in your screen, maybe add some your own overlays. However, you can never access the raw data of the map. So, in short, you can use Google Map data, probably for free, but that comes with some restrictions imposed by Google. The blog posted by the OSM creator Serge Wroclawski is an interesting article to read for this topic.
As far as I know, there are two ways to get the OSM data for a city. The first way is through mapzen.com. It allows you to select a city and download the corresponding OSM data. The second way is through the Openpass API, which needs you to enter the latitude and longitude of a rectangle area. Whatever way you choose, the OSM data you download should be a XML file like the following:
<?xml version="1.0" encoding="UTF-8"?> <osm> <node changeset="19883770" id="274901" lat="22.3460512" lon="114.1811521" ... /> <way changeset="25914878" id="4187007" timestamp="2014-10-07T10:55:01Z" ...> <nd ref="3049050712" /> <nd ref="3049050713" /> <nd ref="2481700725" /> </way> ...
Walk through the OSM XML elements
I am only interested in the <node> and <way> elements of the OSM data, in order to handle the project requirements. The Python xml module can be used to walk through the XML tree quite easily. In the following snippet I only need to process the "start" event:import xml.etree.cElementTree as ET osm_file = open(osmfile, "r") for event, elem in ET.iterparse(osm_file, events=("start",)): if elem.tag == "node" or elem.tag == "way": # do something
In the project, I need to read the data and write it into a JSON file, which can be imported to a MongoDB database for further queries.
Examine the MongdoDB data
I think the most useful knowledge I got from this project is related to the MongoDB stuff. It includes some basic operations like query and update, and some kind of MongoDB aggregation pipeline usage. The pymongo module is a MongoDB client for Python. To connect to a local running MongoDB instance, I just need to :from pymongo import MongoClient # Get the MongoDB database instance by name def get_db(db_name): from pymongo import MongoClient client = MongoClient('localhost:27017') db = client[db_name] return db
I can then run some queries and aggregation like the following:
# Number of nodes > db.hongkong.find({'doc_type':'node'}).count() 877075 # Number of ways > db.hongkong.find({'doc_type':'way'}).count() 93051 # Number of unique users > len(db.hongkong.find().distinct('created.user')) 704 # Top 10 contributing user > list(db.hongkong.aggregate([ {'$group':{'_id': '$created.user', 'count':{ '$sum':1}}}, {'$sort':{'count':-1}}, {'$limit':10}])) [{u'_id': u'xxxx', u'count': 510158}, {u'_id': u'yyyy', u'count': 77302}, ...]