Latest comment: 8 days ago by Jpolvto in topic Scripts


OpenStreetMap integration

[edit]

OpenStreetMap offers significantly more data that could enhance our trail guides, though it’s unclear what licensing or usage agreements would apply in these cases. Examples of useful data include:

  • Distance (km) for each stage
  • Elevation change and elevation profiles
  • Estimated hiking time
  • Complete trail mappings, including sub-relations

It’s worth noting that for most long-distance trails covered on Wikivoyage, the corresponding OpenStreetMap relations are already segmented in a way that aligns well with our guide format. The main exceptions are the Appalachian Trail and the GR 10, which are not divided into logical sections (for our purposes). Jpolvto (talk) 16:58, 4 May 2025 (UTC)Reply

The OSM licences are designed to make reuse possible, but I haven't checked their licensing terms in detail. Much of what we use are data collections rather than literature (maps are counted as such, at least in Finland), and data, I think, isn't protected in the USA. In EU there is the neighbouring database right protecting it, so we should respect the licensing terms (otherwise our EU contributors may have difficulties editing that aspect of our articles). Some uses require attribution; note the "OSM contributors" attribution included by default at the bottom of our dynamic maps. –LPfi (talk) 09:56, 5 May 2025 (UTC)Reply

Scripts

[edit]

For fetching data:

#!/usr/bin/env python3 """ Download + cache the elevation profile for an OSM hiking super‑route.  The pickled result looks like:     {         "df":     pandas.DataFrame,   # elevation profile (dist_km / elev_m)         "breaks": list[float],        # stage‑boundary distances, *including 0.0*         "labels": list[str],          # place‑names, same length as breaks     }  USAGE     python fetch_profile.py                  # default Lapplandsleden     python fetch_profile.py -r 123456 -o mytrail.pkl """  import argparse import itertools import math import pickle import random import time from pathlib import Path  import overpy import pandas as pd import requests from pyproj import Geod from shapely.geometry import LineString from shapely.ops import linemerge  # ─── static config ─────────────────────────────────────────────────── # DENSIFY_SPACING_M = 50 DEM_ENDPOINT = "https://api.opentopodata.org/v1/mapzen"  # global 90 m DEM_BATCH_LIMIT = 50 DELAY_BETWEEN_CALLS = 1.0 MAX_DEM_RETRIES = 4 BASE_BACKOFF_S = 2.5 DEFAULT_RELATION = 13439996     # Lapplandsleden super‑route DEFAULT_OUTFILE = Path("lapplandsleden_profile.pkl") # ───────────────────────────────────────────────────────────────────── #  api = overpy.Overpass(max_retry_count=3, retry_timeout=30) geod = Geod(ellps="WGS84")   def chunks(it, size):     """Yield successive *size*‑element chunks from *it*."""     it = iter(it)     while (batch := list(itertools.islice(it, size))):         yield batch   def fetch_elevations(points):     """Call the DEM API once for the provided shapely Points."""     locs = "|".join(f"{p.y},{p.x}" for p in points)     url = f"{DEM_ENDPOINT}?locations={locs}"      for attempt in range(MAX_DEM_RETRIES):         r = requests.get(url, timeout=20)          if r.ok and r.headers.get("content-type", "").startswith("application/json"):             return [res.get("elevation") for res in r.json()["results"]]          if r.status_code in (429, 503):            # throttled / service busy             back = BASE_BACKOFF_S * (1.7 ** attempt) + random.uniform(0, 0.5)             print(f"[elev] {r.status_code} – sleeping {back:.1f}s")             time.sleep(back)             continue          raise RuntimeError(             f"DEM call failed {r.status_code}: {r.text[:120]!r}")      raise RuntimeError("DEM retries exhausted")   def stage_relations(super_id: int):     """Return the stage relations that make up an OSM super‑route, in their defined order."""     # 1. Fetch the super-relation itself to get its ordered list of members     query_super_rel = f"""     [out:json][timeout:120];     relation({super_id});     out body;  // 'body' includes the relation's members in their defined order     """      result_super_rel = api.query(query_super_rel)      if not result_super_rel.relations:         raise RuntimeError(f"Super-relation {super_id} not found.")      super_relation = result_super_rel.relations[0]      stage_rel_ids_ordered = []     for member in super_relation.members:         # CORRECTED LINE: Use isinstance to check the type of the member object.         # This assumes 'RelationRelation' is exposed by 'import overpy' as 'overpy.RelationRelation'         # which is standard in recent versions.         if isinstance(member, overpy.RelationRelation):             # 'ref' attribute gives the ID of the member             stage_rel_ids_ordered.append(member.ref)      if not stage_rel_ids_ordered:         raise RuntimeError(             f"No stage relations found as members of the super-relation {super_id}.")      # 2. Fetch the details (IDs and tags) for these stage relations     # (The rest of the function remains the same as the previous corrected version)      id_query_part = "".join(         [f"relation({rel_id});" for rel_id in stage_rel_ids_ordered])      query_stages = f"""     [out:json][timeout:120];     (       {id_query_part}     );     out ids tags;     """      result_stages = api.query(query_stages)      rels_by_id = {rel.id: rel for rel in result_stages.relations}      ordered_rels = []     for rel_id in stage_rel_ids_ordered:         if rel_id in rels_by_id:             ordered_rels.append(rels_by_id[rel_id])         else:             print(                 f"[warning] Stage relation {rel_id} (member of {super_id}) was not found or couldn't be fetched.")      if not ordered_rels:         raise RuntimeError(             f"Although {len(stage_rel_ids_ordered)} stage relation(s) were listed in super-relation {super_id}, none could be fetched.")      return ordered_rels   def relation_line(rel_id: int):     """Fetch full geometry of a relation as a merged LineString."""     q = f"""     [out:json][timeout:120];     relation({rel_id});     way(r)->.w; (.w;>;); out geom;     """     ways = api.query(q).ways     return linemerge([LineString([(n.lon, n.lat) for n in w.nodes]) for w in ways])   def densify(line, spacing=DENSIFY_SPACING_M):     """Return evenly‑spaced points along *line* (shapely LineString)."""     length = geod.geometry_length(line)     steps = max(1, math.ceil(length / spacing))     return [line.interpolate(t / steps, normalized=True) for t in range(steps + 1)]   def build_profile(super_id: int):     """Build the distance/elevation profile for a super‑route relation."""     rows = []     breaks = [0.0]          # start at 0 km     labels = []             # will capture the first 'from' label here     total_m = 0.0      for rel in stage_relations(super_id):         sid = rel.id         sfrom = rel.tags.get("from", "?")         sto = rel.tags.get("to",   "?")          if not labels:                      # first loop = trail start             labels.append(sfrom)            # aligns with 0 km in breaks         print(f"[stage] {sid}: {sfrom}{sto}")          pts = densify(relation_line(sid))          elev = []         for batch in chunks(pts, DEM_BATCH_LIMIT):             elev.extend(fetch_elevations(batch))             time.sleep(DELAY_BETWEEN_CALLS)          seg_m = 0.0         for i, (p, z) in enumerate(zip(pts, elev)):             if i:                 seg_m += geod.inv(pts[i-1].x, pts[i-1].y, p.x, p.y)[2]             rows.append({                 "dist_km": (total_m + seg_m) / 1_000,                 "elev_m":  z,             })          total_m += geod.geometry_length(relation_line(sid))         breaks.append(total_m / 1_000)      # arrival at stage finish         labels.append(sto)      df = (pd.DataFrame(rows)             .infer_objects(copy=False)             .apply(pd.to_numeric, errors="coerce"))     df["elev_m"] = (df["elev_m"]                     .interpolate(limit_direction="both")                     .bfill().ffill()                     .round()                     .astype("Int64"))      return {         "df":     df,         "breaks": breaks,         "labels": labels,     }   def main():     cli = argparse.ArgumentParser(         description="Fetch & cache trail elevation profile.")     cli.add_argument("-r", "--relation", type=int, default=DEFAULT_RELATION,                      help="OSM relation ID of the super‑route")     cli.add_argument("-o", "--outfile", type=Path, default=DEFAULT_OUTFILE,                      help="Where to store the pickled profile")     args = cli.parse_args()      bundle = build_profile(args.relation)     with open(args.outfile, "wb") as f:         pickle.dump(bundle, f)     print(f"[saved] {args.outfile}  ({bundle['df'].shape[0]:,} points)")   if __name__ == "__main__":     main() 

For creating wikitext/matplotlib elevation profile:

#!/usr/bin/env python3 """ Emit Wikivoyage‑ready stage descriptions **and** save a Matplotlib elevation profile PNG (with vertical annotations).  Design goals ============ * **Single frame:** Only one X‑axis and one Y‑axis, intersecting at the origin   (0 km, 0 m).  Top and right spines remain hidden. * **Numeric kilometre ticks preserved** on the bottom axis. * **Stage labels shown on a secondary (invisible) top axis** so they don’t   crowd the profile and numeric ticks coexist. * **No raster/grid** background.  Run ``python plot_profile.py -h`` for CLI details. """ from __future__ import annotations  import argparse import pickle from pathlib import Path from typing import List  import matplotlib.pyplot as plt from matplotlib import ticker import pandas as pd from scipy.signal import savgol_filter  # --------------------------------------------------------------------------- # Constants -----------------------------------------------------------------  DEFAULT_INFILE = Path("lapplandsleden_profile.pkl") DEFAULT_IMG = Path("profile.png") HALF_CHAR = "½"                     # typographic half‑symbol SMOOTH_WIN = 81                      # global Savitzky–Golay window (odd) SMOOTH_ORDER = 2                     # polynomial order for S‑G filter  # --------------------------------------------------------------------------- # Formatting helpers --------------------------------------------------------   def format_km(km: float, decimals: int = 1) -> str:     return f"{km:.{decimals}f}".rstrip("0").rstrip(".")   def format_hours(hours: float) -> str:     hrs = round(hours * 2) / 2  # nearest 0.5 h     whole = int(hrs)     return f"{whole}\u202Fh" if hrs % 1 == 0 else f"{whole}{HALF_CHAR}\u202Fh"   def duration_naismith(dist_km: float, gain_m: float) -> float:     """Naismith rule: 4 km h⁻¹ on the flat + 600 m ascent h⁻¹."""     return dist_km / 4.0 + gain_m / 600.0  # --------------------------------------------------------------------------- # Stage description generator ----------------------------------------------   def build_descriptions(df: pd.DataFrame, breaks: List[float], labels: List[str]) -> str:     """Return Wikivoyage‑style === PlaceA – PlaceB === blocks."""     start_dists = breaks[:-1]     end_dists = breaks[1:]     start_names = labels[:-1]     end_names = labels[1:]      thr = 1  # ignore ±1 m noise in cumulative gain/loss     blocks: list[str] = []      for s_name, e_name, d0, d1 in zip(start_names, end_names, start_dists, end_dists):         seg = df.loc[(df["dist_km"] >= d0) & (             df["dist_km"] <= d1)].reset_index(drop=True)         dist_km = d1 - d0         diffs = seg["elev_smoothed"].diff().fillna(0)         gain_m = int(round(diffs[diffs > thr].sum()))         loss_m = int(round(-diffs[diffs < -thr].sum()))         time_h = duration_naismith(dist_km, gain_m)          blocks.append(             f"=== {s_name}{e_name} ===\n"             f"; Distance : {format_km(dist_km)}\u202Fkm\n"             f"; Duration : {format_hours(time_h)}\n"             f"; Elevation gain : ↥\u202F{gain_m}\u202Fm\n"             f"; Elevation loss : ↧\u202F{loss_m}\u202Fm\n"         )      return "\n".join(blocks)  # --------------------------------------------------------------------------- # Matplotlib chart builder ---------------------------------------------------   def make_matplotlib_chart(     df: pd.DataFrame,     breaks: List[float],     labels: List[str],     *,     outfile: Path = DEFAULT_IMG,     width_in: float = 8,     height_in: float = 3,     dpi: int = 150,     km_tick: int | None = 25,  # distance between numeric ticks ) -> Path:     """Draw profile and save to *outfile*.      - Bottom axis: numeric kilometre ticks.     - Top (secondary) axis: stage labels (vertical text).     - Single visible frame (left & bottom spines) at origin; no grid.     """     fig, ax = plt.subplots(figsize=(width_in, height_in), dpi=dpi)      # Profile line ----------------------------------------------------------     ax.plot(df["dist_km"], df["elev_smoothed"], linewidth=1.2)      # Axes spines at origin --------------------------------------------------     ax.spines["left"].set_position("zero")     ax.spines["bottom"].set_position("zero")     ax.spines["right"].set_visible(False)     ax.spines["top"].set_visible(False)     ax.xaxis.set_ticks_position("bottom")     ax.yaxis.set_ticks_position("left")      # Numeric kilometre ticks ----------------------------------------------     if km_tick:         ax.xaxis.set_major_locator(ticker.MultipleLocator(km_tick))      # Stage dividers --------------------------------------------------------     for x in breaks:         ax.axvline(x=x, color="grey", linestyle="--", linewidth=0.8)      # Secondary x‑axis for labels ------------------------------------------     secax = ax.secondary_xaxis("top")     secax.set_xticks(breaks)     secax.set_xticklabels(labels, rotation=90, fontsize=8,                           ha="center", va="bottom")     # Hide secondary spine to keep single‑frame aesthetic     secax.spines["top"].set_visible(False)     secax.spines["right"].set_visible(False)     secax.spines["left"].set_visible(False)      # Axis limits & labels ---------------------------------------------------     y_min, y_max = df["elev_smoothed"].min(), df["elev_smoothed"].max() + 10     ax.set_ylim(min(0, y_min), y_max)     ax.set_xlim(0, max(df["dist_km"] + 10))     ax.set_xlabel("Distance (km)")     ax.set_ylabel("Elevation (m)")       # Layout tweaks ----------------------------------------------------------     fig.tight_layout()     fig.subplots_adjust(top=0.60)  # space for vertical stage labels      fig.savefig(outfile)     plt.close(fig)     return outfile  # --------------------------------------------------------------------------- # CLI -----------------------------------------------------------------------   def main() -> None:     cli = argparse.ArgumentParser(         description="Emit Wikivoyage stage descriptions and a PNG profile.")     cli.add_argument("-i", "--infile", type=Path, default=DEFAULT_INFILE,                      help="Pickle produced by fetch_profile.py")     cli.add_argument("-o", "--output", metavar="FILE.wiki",                      help="Write description blocks here (default: <infile>.wiki)")     cli.add_argument("--img", metavar="FILE.png", type=Path, default=DEFAULT_IMG,                      help="PNG path for the rendered profile")     cli.add_argument("--smooth-window", type=int, default=SMOOTH_WIN,                      help="Savitzky–Golay window length (odd)")     cli.add_argument("--smooth-order", type=int, default=SMOOTH_ORDER,                      help="Savitzky–Golay polynomial order")     cli.add_argument("--km-tick", type=int, default=25,                      help="Distance between numeric kilometre ticks (default 25)")     args = cli.parse_args()      # Validate pickle --------------------------------------------------------     if not args.infile.exists():         raise SystemExit(             f"Cache {args.infile} not found – run fetch_profile.py first.")      with open(args.infile, "rb") as f:         bundle = pickle.load(f)     df: pd.DataFrame = bundle["df"]     breaks: List[float] = bundle["breaks"]     labels: List[str] = bundle["labels"]      # Smooth elevation ------------------------------------------------------     df = df.copy()     df["elev_smoothed"] = savgol_filter(         df["elev_m"].to_numpy(float),         window_length=max(3, args.smooth_window | 1),  # force odd         polyorder=args.smooth_order,         mode="nearest",     )      # Outputs ---------------------------------------------------------------     img_path = make_matplotlib_chart(         df, breaks, labels, outfile=args.img, km_tick=args.km_tick)     desc_path = Path(         args.output) if args.output else args.infile.with_suffix(".wiki")     desc_path.write_text(build_descriptions(         df, breaks, labels) + "\n", encoding="utf-8")      print(f"[saved] {desc_path}\n[saved] {img_path}")   # --------------------------------------------------------------------------- if __name__ == "__main__":     main() 

4720d877-0865-4bdc-99b8-1443ac9f93a6 (talk) 10:55, 9 May 2025 (UTC)Reply