About our datasources

Comprehensive guide to On-Tap's data sources, processing pipeline, and data quality standards for brewery, winery, and distillery information.

📅 September 1, 2025 ⏱️ 8 min read ✍️ On-Tap Team

About our data sources

This article explains where our venue data comes from, how we process it, how often we refresh it, and how you can help keep it accurate. In short: we stand on the shoulders of open data communities and carefully transform those datasets to power a fast, map‑first experience for beer, wine, and spirits places.

What we use

We do not scrape private data, track individuals, or purchase commercial listings. Our emphasis is on open, auditable sources.

Source breakdown

Processing pipeline (high‑level)

  1. Ingest: We import fresh OSM planet extracts into a columnar/analytical store for filtering and transformation.
  2. Filter: We select only the nodes/ways/relations carrying relevant tags (e.g., brewery, pub, wine‑bar, distillery, wine‑shop, beer‑store, bar).
  3. Normalize: We standardize names, categories, and addresses; we derive display‑friendly fields; we compute geospatial centroids for map rendering.
  4. Enrich: When possible, we link venues to producer references (e.g., a brewery venue to a known brewery entity) to improve search and deduplication.
  5. Export: We produce compact artifacts optimized for fast client‑side maps and server‑side APIs.

The repository layout reflects this split: the cc.on-tap.data project handles ingestion, filtering, and export; the cc.on-tap web app consumes those exports for the map and directory UI.

Categories and tagging

We map common OSM tags into a small set of app categories for clarity:

If you notice a venue mis‑categorized, it’s often due to upstream tags. Updating tags in OSM is the most durable fix and benefits everyone relying on open data.

Update cadence

Data quality and corrections

Privacy

We do not collect personal data about visitors’ locations beyond what is strictly necessary for basic geolocation features (e.g., “find near me”), and we do not store individualized movement histories. Venue data is public and non‑personal by nature.

Licensing and attribution

If you reuse our compiled exports, please retain source attributions and include links back to the original projects.

Changelog for this article

📂 documentation📂 data-science #data#open-source#OpenStreetMap#transparency