{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "16473b52-64fd-425b-bab4-356708192bab",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "# QC protocol for Private Weather Stations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "957fd8e7-df3f-47d8-b57c-e60e26513338",
   "metadata": {},
   "source": [
    "This notebook presents how to use the Python package `pypwsqc`, a quality assurance protocol developed for automated private weather stations (PWS). The protocol consists of three filters; the Faulty Zero filter, the High Influx filter and the Station Outlier filter.\n",
    "\n",
    "The package is based on the original R code available at https://github.com/LottedeVos/PWSQC/.\n",
    "\n",
    "Publication: de Vos, L. W., Leijnse, H., Overeem, A., & Uijlenhoet, R. (2019). Quality control for crowdsourced personal weather stations to enable operational rainfall monitoring. Geophysical Research Letters, 46(15), 8820-8829\n",
    "\n",
    "`pypwsqc` depends on the `poligrain`, `xarray`, `pandas` and `numpy` packages. Make sure to install and import the required packages first."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "78857b63-6c25-4391-be95-119a6e906aeb",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import poligrain as plg\n",
    "import xarray as xr\n",
    "\n",
    "import pypwsqc"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d852ea47-40ab-4956-ac7c-f9aaa1dee996",
   "metadata": {},
   "source": [
    "## Download example data\n",
    "\n",
    "In this example, we use an open PWS dataset from Amsterdam, called the \"AMS PWS\" dataset. By running the cell below, an example NetCDF-file will be downloaded to your current repository (if your machine is connected to the internet)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "25b78fd5-c92b-4854-9a56-cefe8450f734",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
      "                                 Dload  Upload   Total   Spent    Left  Speed\n",
      "  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\n",
      "100 5687k  100 5687k    0     0  3800k      0  0:00:01  0:00:01 --:--:-- 6267k\n"
     ]
    }
   ],
   "source": [
    "!curl -OL https://github.com/OpenSenseAction/OS_data_format_conventions/raw/main/notebooks/data/OpenSense_PWS_example_format_data.nc"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e420966c-eba1-4a40-aa4b-e1f10e7bbe26",
   "metadata": {},
   "source": [
    "## Data preparations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fa7460be-a65e-4549-831d-f11fa418a21c",
   "metadata": {},
   "source": [
    "This package handles rainfall data as `xarray`  Datasets. The data set must have `time` and `id` dimensions, `latitude` and `longitude` as coordinates, and `rainfall` as data variable.\n",
    "\n",
    "An example of how to convert .csv data to a `xarray` dataset is found [here](https://github.com/OpenSenseAction/OS_data_format_conventions/blob/main/notebooks/PWS_example_dataset.ipynb).\n",
    "\n",
    "We now load the data set under the name  `ds_pws`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "9a8f4054-4282-42a0-bfff-c12a55241672",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><svg style=\"position: absolute; width: 0; height: 0; overflow: hidden\">\n",
       "<defs>\n",
       "<symbol id=\"icon-database\" viewBox=\"0 0 32 32\">\n",
       "<path d=\"M16 0c-8.837 0-16 2.239-16 5v4c0 2.761 7.163 5 16 5s16-2.239 16-5v-4c0-2.761-7.163-5-16-5z\"></path>\n",
       "<path d=\"M16 17c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z\"></path>\n",
       "<path d=\"M16 26c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z\"></path>\n",
       "</symbol>\n",
       "<symbol id=\"icon-file-text2\" viewBox=\"0 0 32 32\">\n",
       "<path d=\"M28.681 7.159c-0.694-0.947-1.662-2.053-2.724-3.116s-2.169-2.030-3.116-2.724c-1.612-1.182-2.393-1.319-2.841-1.319h-15.5c-1.378 0-2.5 1.121-2.5 2.5v27c0 1.378 1.122 2.5 2.5 2.5h23c1.378 0 2.5-1.122 2.5-2.5v-19.5c0-0.448-0.137-1.23-1.319-2.841zM24.543 5.457c0.959 0.959 1.712 1.825 2.268 2.543h-4.811v-4.811c0.718 0.556 1.584 1.309 2.543 2.268zM28 29.5c0 0.271-0.229 0.5-0.5 0.5h-23c-0.271 0-0.5-0.229-0.5-0.5v-27c0-0.271 0.229-0.5 0.5-0.5 0 0 15.499-0 15.5 0v7c0 0.552 0.448 1 1 1h7v19.5z\"></path>\n",
       "<path d=\"M23 26h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z\"></path>\n",
       "<path d=\"M23 22h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z\"></path>\n",
       "<path d=\"M23 18h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z\"></path>\n",
       "</symbol>\n",
       "</defs>\n",
       "</svg>\n",
       "<style>/* CSS stylesheet for displaying xarray objects in jupyterlab.\n",
       " *\n",
       " */\n",
       "\n",
       ":root {\n",
       "  --xr-font-color0: var(--jp-content-font-color0, rgba(0, 0, 0, 1));\n",
       "  --xr-font-color2: var(--jp-content-font-color2, rgba(0, 0, 0, 0.54));\n",
       "  --xr-font-color3: var(--jp-content-font-color3, rgba(0, 0, 0, 0.38));\n",
       "  --xr-border-color: var(--jp-border-color2, #e0e0e0);\n",
       "  --xr-disabled-color: var(--jp-layout-color3, #bdbdbd);\n",
       "  --xr-background-color: var(--jp-layout-color0, white);\n",
       "  --xr-background-color-row-even: var(--jp-layout-color1, white);\n",
       "  --xr-background-color-row-odd: var(--jp-layout-color2, #eeeeee);\n",
       "}\n",
       "\n",
       "html[theme=dark],\n",
       "body[data-theme=dark],\n",
       "body.vscode-dark {\n",
       "  --xr-font-color0: rgba(255, 255, 255, 1);\n",
       "  --xr-font-color2: rgba(255, 255, 255, 0.54);\n",
       "  --xr-font-color3: rgba(255, 255, 255, 0.38);\n",
       "  --xr-border-color: #1F1F1F;\n",
       "  --xr-disabled-color: #515151;\n",
       "  --xr-background-color: #111111;\n",
       "  --xr-background-color-row-even: #111111;\n",
       "  --xr-background-color-row-odd: #313131;\n",
       "}\n",
       "\n",
       ".xr-wrap {\n",
       "  display: block !important;\n",
       "  min-width: 300px;\n",
       "  max-width: 700px;\n",
       "}\n",
       "\n",
       ".xr-text-repr-fallback {\n",
       "  /* fallback to plain text repr when CSS is not injected (untrusted notebook) */\n",
       "  display: none;\n",
       "}\n",
       "\n",
       ".xr-header {\n",
       "  padding-top: 6px;\n",
       "  padding-bottom: 6px;\n",
       "  margin-bottom: 4px;\n",
       "  border-bottom: solid 1px var(--xr-border-color);\n",
       "}\n",
       "\n",
       ".xr-header > div,\n",
       ".xr-header > ul {\n",
       "  display: inline;\n",
       "  margin-top: 0;\n",
       "  margin-bottom: 0;\n",
       "}\n",
       "\n",
       ".xr-obj-type,\n",
       ".xr-array-name {\n",
       "  margin-left: 2px;\n",
       "  margin-right: 10px;\n",
       "}\n",
       "\n",
       ".xr-obj-type {\n",
       "  color: var(--xr-font-color2);\n",
       "}\n",
       "\n",
       ".xr-sections {\n",
       "  padding-left: 0 !important;\n",
       "  display: grid;\n",
       "  grid-template-columns: 150px auto auto 1fr 20px 20px;\n",
       "}\n",
       "\n",
       ".xr-section-item {\n",
       "  display: contents;\n",
       "}\n",
       "\n",
       ".xr-section-item input {\n",
       "  display: none;\n",
       "}\n",
       "\n",
       ".xr-section-item input + label {\n",
       "  color: var(--xr-disabled-color);\n",
       "}\n",
       "\n",
       ".xr-section-item input:enabled + label {\n",
       "  cursor: pointer;\n",
       "  color: var(--xr-font-color2);\n",
       "}\n",
       "\n",
       ".xr-section-item input:enabled + label:hover {\n",
       "  color: var(--xr-font-color0);\n",
       "}\n",
       "\n",
       ".xr-section-summary {\n",
       "  grid-column: 1;\n",
       "  color: var(--xr-font-color2);\n",
       "  font-weight: 500;\n",
       "}\n",
       "\n",
       ".xr-section-summary > span {\n",
       "  display: inline-block;\n",
       "  padding-left: 0.5em;\n",
       "}\n",
       "\n",
       ".xr-section-summary-in:disabled + label {\n",
       "  color: var(--xr-font-color2);\n",
       "}\n",
       "\n",
       ".xr-section-summary-in + label:before {\n",
       "  display: inline-block;\n",
       "  content: '►';\n",
       "  font-size: 11px;\n",
       "  width: 15px;\n",
       "  text-align: center;\n",
       "}\n",
       "\n",
       ".xr-section-summary-in:disabled + label:before {\n",
       "  color: var(--xr-disabled-color);\n",
       "}\n",
       "\n",
       ".xr-section-summary-in:checked + label:before {\n",
       "  content: '▼';\n",
       "}\n",
       "\n",
       ".xr-section-summary-in:checked + label > span {\n",
       "  display: none;\n",
       "}\n",
       "\n",
       ".xr-section-summary,\n",
       ".xr-section-inline-details {\n",
       "  padding-top: 4px;\n",
       "  padding-bottom: 4px;\n",
       "}\n",
       "\n",
       ".xr-section-inline-details {\n",
       "  grid-column: 2 / -1;\n",
       "}\n",
       "\n",
       ".xr-section-details {\n",
       "  display: none;\n",
       "  grid-column: 1 / -1;\n",
       "  margin-bottom: 5px;\n",
       "}\n",
       "\n",
       ".xr-section-summary-in:checked ~ .xr-section-details {\n",
       "  display: contents;\n",
       "}\n",
       "\n",
       ".xr-array-wrap {\n",
       "  grid-column: 1 / -1;\n",
       "  display: grid;\n",
       "  grid-template-columns: 20px auto;\n",
       "}\n",
       "\n",
       ".xr-array-wrap > label {\n",
       "  grid-column: 1;\n",
       "  vertical-align: top;\n",
       "}\n",
       "\n",
       ".xr-preview {\n",
       "  color: var(--xr-font-color3);\n",
       "}\n",
       "\n",
       ".xr-array-preview,\n",
       ".xr-array-data {\n",
       "  padding: 0 5px !important;\n",
       "  grid-column: 2;\n",
       "}\n",
       "\n",
       ".xr-array-data,\n",
       ".xr-array-in:checked ~ .xr-array-preview {\n",
       "  display: none;\n",
       "}\n",
       "\n",
       ".xr-array-in:checked ~ .xr-array-data,\n",
       ".xr-array-preview {\n",
       "  display: inline-block;\n",
       "}\n",
       "\n",
       ".xr-dim-list {\n",
       "  display: inline-block !important;\n",
       "  list-style: none;\n",
       "  padding: 0 !important;\n",
       "  margin: 0;\n",
       "}\n",
       "\n",
       ".xr-dim-list li {\n",
       "  display: inline-block;\n",
       "  padding: 0;\n",
       "  margin: 0;\n",
       "}\n",
       "\n",
       ".xr-dim-list:before {\n",
       "  content: '(';\n",
       "}\n",
       "\n",
       ".xr-dim-list:after {\n",
       "  content: ')';\n",
       "}\n",
       "\n",
       ".xr-dim-list li:not(:last-child):after {\n",
       "  content: ',';\n",
       "  padding-right: 5px;\n",
       "}\n",
       "\n",
       ".xr-has-index {\n",
       "  font-weight: bold;\n",
       "}\n",
       "\n",
       ".xr-var-list,\n",
       ".xr-var-item {\n",
       "  display: contents;\n",
       "}\n",
       "\n",
       ".xr-var-item > div,\n",
       ".xr-var-item label,\n",
       ".xr-var-item > .xr-var-name span {\n",
       "  background-color: var(--xr-background-color-row-even);\n",
       "  margin-bottom: 0;\n",
       "}\n",
       "\n",
       ".xr-var-item > .xr-var-name:hover span {\n",
       "  padding-right: 5px;\n",
       "}\n",
       "\n",
       ".xr-var-list > li:nth-child(odd) > div,\n",
       ".xr-var-list > li:nth-child(odd) > label,\n",
       ".xr-var-list > li:nth-child(odd) > .xr-var-name span {\n",
       "  background-color: var(--xr-background-color-row-odd);\n",
       "}\n",
       "\n",
       ".xr-var-name {\n",
       "  grid-column: 1;\n",
       "}\n",
       "\n",
       ".xr-var-dims {\n",
       "  grid-column: 2;\n",
       "}\n",
       "\n",
       ".xr-var-dtype {\n",
       "  grid-column: 3;\n",
       "  text-align: right;\n",
       "  color: var(--xr-font-color2);\n",
       "}\n",
       "\n",
       ".xr-var-preview {\n",
       "  grid-column: 4;\n",
       "}\n",
       "\n",
       ".xr-index-preview {\n",
       "  grid-column: 2 / 5;\n",
       "  color: var(--xr-font-color2);\n",
       "}\n",
       "\n",
       ".xr-var-name,\n",
       ".xr-var-dims,\n",
       ".xr-var-dtype,\n",
       ".xr-preview,\n",
       ".xr-attrs dt {\n",
       "  white-space: nowrap;\n",
       "  overflow: hidden;\n",
       "  text-overflow: ellipsis;\n",
       "  padding-right: 10px;\n",
       "}\n",
       "\n",
       ".xr-var-name:hover,\n",
       ".xr-var-dims:hover,\n",
       ".xr-var-dtype:hover,\n",
       ".xr-attrs dt:hover {\n",
       "  overflow: visible;\n",
       "  width: auto;\n",
       "  z-index: 1;\n",
       "}\n",
       "\n",
       ".xr-var-attrs,\n",
       ".xr-var-data,\n",
       ".xr-index-data {\n",
       "  display: none;\n",
       "  background-color: var(--xr-background-color) !important;\n",
       "  padding-bottom: 5px !important;\n",
       "}\n",
       "\n",
       ".xr-var-attrs-in:checked ~ .xr-var-attrs,\n",
       ".xr-var-data-in:checked ~ .xr-var-data,\n",
       ".xr-index-data-in:checked ~ .xr-index-data {\n",
       "  display: block;\n",
       "}\n",
       "\n",
       ".xr-var-data > table {\n",
       "  float: right;\n",
       "}\n",
       "\n",
       ".xr-var-name span,\n",
       ".xr-var-data,\n",
       ".xr-index-name div,\n",
       ".xr-index-data,\n",
       ".xr-attrs {\n",
       "  padding-left: 25px !important;\n",
       "}\n",
       "\n",
       ".xr-attrs,\n",
       ".xr-var-attrs,\n",
       ".xr-var-data,\n",
       ".xr-index-data {\n",
       "  grid-column: 1 / -1;\n",
       "}\n",
       "\n",
       "dl.xr-attrs {\n",
       "  padding: 0;\n",
       "  margin: 0;\n",
       "  display: grid;\n",
       "  grid-template-columns: 125px auto;\n",
       "}\n",
       "\n",
       ".xr-attrs dt,\n",
       ".xr-attrs dd {\n",
       "  padding: 0;\n",
       "  margin: 0;\n",
       "  float: left;\n",
       "  padding-right: 10px;\n",
       "  width: auto;\n",
       "}\n",
       "\n",
       ".xr-attrs dt {\n",
       "  font-weight: normal;\n",
       "  grid-column: 1;\n",
       "}\n",
       "\n",
       ".xr-attrs dt:hover span {\n",
       "  display: inline-block;\n",
       "  background: var(--xr-background-color);\n",
       "  padding-right: 10px;\n",
       "}\n",
       "\n",
       ".xr-attrs dd {\n",
       "  grid-column: 2;\n",
       "  white-space: pre-wrap;\n",
       "  word-break: break-all;\n",
       "}\n",
       "\n",
       ".xr-icon-database,\n",
       ".xr-icon-file-text2,\n",
       ".xr-no-icon {\n",
       "  display: inline-block;\n",
       "  vertical-align: middle;\n",
       "  width: 1em;\n",
       "  height: 1.5em !important;\n",
       "  stroke-width: 0;\n",
       "  stroke: currentColor;\n",
       "  fill: currentColor;\n",
       "}\n",
       "</style><pre class='xr-text-repr-fallback'>&lt;xarray.Dataset&gt;\n",
       "Dimensions:    (time: 219168, id: 134)\n",
       "Coordinates:\n",
       "  * time       (time) datetime64[ns] 2016-05-01T00:05:00 ... 2018-06-01\n",
       "  * id         (id) &lt;U6 &#x27;ams1&#x27; &#x27;ams2&#x27; &#x27;ams3&#x27; ... &#x27;ams132&#x27; &#x27;ams133&#x27; &#x27;ams134&#x27;\n",
       "    elevation  (id) &lt;U3 ...\n",
       "    latitude   (id) float64 ...\n",
       "    longitude  (id) float64 ...\n",
       "Data variables:\n",
       "    rainfall   (id, time) float64 ...\n",
       "Attributes:\n",
       "    title:                 PWS data from Amsterdam\n",
       "    file author:           Maximilian Graf\n",
       "    institution:           Wageningen University and Research, Department of ...\n",
       "    date:                  2022-10-18 10:32:00\n",
       "    source:                Netamo PWS\n",
       "    history:               Data derived and reformated from the originally pu...\n",
       "    naming convention:     OpenSense-0.1\n",
       "    license restrictions:  CC-BY 4.0 https://creativecommons.org/licenses/by/...\n",
       "    reference:             https://doi.org/10.1029/2019GL083731\n",
       "    comment:               </pre><div class='xr-wrap' style='display:none'><div class='xr-header'><div class='xr-obj-type'>xarray.Dataset</div></div><ul class='xr-sections'><li class='xr-section-item'><input id='section-7d8a1717-c157-4010-83bf-b186f783553f' class='xr-section-summary-in' type='checkbox' disabled ><label for='section-7d8a1717-c157-4010-83bf-b186f783553f' class='xr-section-summary'  title='Expand/collapse section'>Dimensions:</label><div class='xr-section-inline-details'><ul class='xr-dim-list'><li><span class='xr-has-index'>time</span>: 219168</li><li><span class='xr-has-index'>id</span>: 134</li></ul></div><div class='xr-section-details'></div></li><li class='xr-section-item'><input id='section-bac70251-4878-4582-b866-34290792a6bc' class='xr-section-summary-in' type='checkbox'  checked><label for='section-bac70251-4878-4582-b866-34290792a6bc' class='xr-section-summary' >Coordinates: <span>(5)</span></label><div class='xr-section-inline-details'></div><div class='xr-section-details'><ul class='xr-var-list'><li class='xr-var-item'><div class='xr-var-name'><span class='xr-has-index'>time</span></div><div class='xr-var-dims'>(time)</div><div class='xr-var-dtype'>datetime64[ns]</div><div class='xr-var-preview xr-preview'>2016-05-01T00:05:00 ... 2018-06-01</div><input id='attrs-51698e34-fe7b-45e1-a43b-a292cedd6965' class='xr-var-attrs-in' type='checkbox' ><label for='attrs-51698e34-fe7b-45e1-a43b-a292cedd6965' title='Show/Hide attributes'><svg class='icon xr-icon-file-text2'><use xlink:href='#icon-file-text2'></use></svg></label><input id='data-97780c0e-4ff2-430b-9f75-b45f2fbc46b6' class='xr-var-data-in' type='checkbox'><label for='data-97780c0e-4ff2-430b-9f75-b45f2fbc46b6' title='Show/Hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-var-attrs'><dl class='xr-attrs'><dt><span>unit :</span></dt><dd>seconds since 1970-01-01 00:00:00</dd></dl></div><div class='xr-var-data'><pre>array([&#x27;2016-05-01T00:05:00.000000000&#x27;, &#x27;2016-05-01T00:10:00.000000000&#x27;,\n",
       "       &#x27;2016-05-01T00:15:00.000000000&#x27;, ..., &#x27;2018-05-31T23:50:00.000000000&#x27;,\n",
       "       &#x27;2018-05-31T23:55:00.000000000&#x27;, &#x27;2018-06-01T00:00:00.000000000&#x27;],\n",
       "      dtype=&#x27;datetime64[ns]&#x27;)</pre></div></li><li class='xr-var-item'><div class='xr-var-name'><span class='xr-has-index'>id</span></div><div class='xr-var-dims'>(id)</div><div class='xr-var-dtype'>&lt;U6</div><div class='xr-var-preview xr-preview'>&#x27;ams1&#x27; &#x27;ams2&#x27; ... &#x27;ams133&#x27; &#x27;ams134&#x27;</div><input id='attrs-9cfb09ba-f13f-4565-bbea-89e84a9d0821' class='xr-var-attrs-in' type='checkbox' disabled><label for='attrs-9cfb09ba-f13f-4565-bbea-89e84a9d0821' title='Show/Hide attributes'><svg class='icon xr-icon-file-text2'><use xlink:href='#icon-file-text2'></use></svg></label><input id='data-05057728-6da8-420d-9b34-0f3eb56cff19' class='xr-var-data-in' type='checkbox'><label for='data-05057728-6da8-420d-9b34-0f3eb56cff19' title='Show/Hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-var-attrs'><dl class='xr-attrs'></dl></div><div class='xr-var-data'><pre>array([&#x27;ams1&#x27;, &#x27;ams2&#x27;, &#x27;ams3&#x27;, &#x27;ams4&#x27;, &#x27;ams5&#x27;, &#x27;ams6&#x27;, &#x27;ams7&#x27;, &#x27;ams8&#x27;, &#x27;ams9&#x27;,\n",
       "       &#x27;ams10&#x27;, &#x27;ams11&#x27;, &#x27;ams12&#x27;, &#x27;ams13&#x27;, &#x27;ams14&#x27;, &#x27;ams15&#x27;, &#x27;ams16&#x27;, &#x27;ams17&#x27;,\n",
       "       &#x27;ams18&#x27;, &#x27;ams19&#x27;, &#x27;ams20&#x27;, &#x27;ams21&#x27;, &#x27;ams22&#x27;, &#x27;ams23&#x27;, &#x27;ams24&#x27;, &#x27;ams25&#x27;,\n",
       "       &#x27;ams26&#x27;, &#x27;ams27&#x27;, &#x27;ams28&#x27;, &#x27;ams29&#x27;, &#x27;ams30&#x27;, &#x27;ams31&#x27;, &#x27;ams32&#x27;, &#x27;ams33&#x27;,\n",
       "       &#x27;ams34&#x27;, &#x27;ams35&#x27;, &#x27;ams36&#x27;, &#x27;ams37&#x27;, &#x27;ams38&#x27;, &#x27;ams39&#x27;, &#x27;ams40&#x27;, &#x27;ams41&#x27;,\n",
       "       &#x27;ams42&#x27;, &#x27;ams43&#x27;, &#x27;ams44&#x27;, &#x27;ams45&#x27;, &#x27;ams46&#x27;, &#x27;ams47&#x27;, &#x27;ams48&#x27;, &#x27;ams49&#x27;,\n",
       "       &#x27;ams50&#x27;, &#x27;ams51&#x27;, &#x27;ams52&#x27;, &#x27;ams53&#x27;, &#x27;ams54&#x27;, &#x27;ams55&#x27;, &#x27;ams56&#x27;, &#x27;ams57&#x27;,\n",
       "       &#x27;ams58&#x27;, &#x27;ams59&#x27;, &#x27;ams60&#x27;, &#x27;ams61&#x27;, &#x27;ams62&#x27;, &#x27;ams63&#x27;, &#x27;ams64&#x27;, &#x27;ams65&#x27;,\n",
       "       &#x27;ams66&#x27;, &#x27;ams67&#x27;, &#x27;ams68&#x27;, &#x27;ams69&#x27;, &#x27;ams70&#x27;, &#x27;ams71&#x27;, &#x27;ams72&#x27;, &#x27;ams73&#x27;,\n",
       "       &#x27;ams74&#x27;, &#x27;ams75&#x27;, &#x27;ams76&#x27;, &#x27;ams77&#x27;, &#x27;ams78&#x27;, &#x27;ams79&#x27;, &#x27;ams80&#x27;, &#x27;ams81&#x27;,\n",
       "       &#x27;ams82&#x27;, &#x27;ams83&#x27;, &#x27;ams84&#x27;, &#x27;ams85&#x27;, &#x27;ams86&#x27;, &#x27;ams87&#x27;, &#x27;ams88&#x27;, &#x27;ams89&#x27;,\n",
       "       &#x27;ams90&#x27;, &#x27;ams91&#x27;, &#x27;ams92&#x27;, &#x27;ams93&#x27;, &#x27;ams94&#x27;, &#x27;ams95&#x27;, &#x27;ams96&#x27;, &#x27;ams97&#x27;,\n",
       "       &#x27;ams98&#x27;, &#x27;ams99&#x27;, &#x27;ams100&#x27;, &#x27;ams101&#x27;, &#x27;ams102&#x27;, &#x27;ams103&#x27;, &#x27;ams104&#x27;,\n",
       "       &#x27;ams105&#x27;, &#x27;ams106&#x27;, &#x27;ams107&#x27;, &#x27;ams108&#x27;, &#x27;ams109&#x27;, &#x27;ams110&#x27;, &#x27;ams111&#x27;,\n",
       "       &#x27;ams112&#x27;, &#x27;ams113&#x27;, &#x27;ams114&#x27;, &#x27;ams115&#x27;, &#x27;ams116&#x27;, &#x27;ams117&#x27;, &#x27;ams118&#x27;,\n",
       "       &#x27;ams119&#x27;, &#x27;ams120&#x27;, &#x27;ams121&#x27;, &#x27;ams122&#x27;, &#x27;ams123&#x27;, &#x27;ams124&#x27;, &#x27;ams125&#x27;,\n",
       "       &#x27;ams126&#x27;, &#x27;ams127&#x27;, &#x27;ams128&#x27;, &#x27;ams129&#x27;, &#x27;ams130&#x27;, &#x27;ams131&#x27;, &#x27;ams132&#x27;,\n",
       "       &#x27;ams133&#x27;, &#x27;ams134&#x27;], dtype=&#x27;&lt;U6&#x27;)</pre></div></li><li class='xr-var-item'><div class='xr-var-name'><span>elevation</span></div><div class='xr-var-dims'>(id)</div><div class='xr-var-dtype'>&lt;U3</div><div class='xr-var-preview xr-preview'>...</div><input id='attrs-fdd31127-86c4-41be-b2cc-12ef0a72a8c9' class='xr-var-attrs-in' type='checkbox' ><label for='attrs-fdd31127-86c4-41be-b2cc-12ef0a72a8c9' title='Show/Hide attributes'><svg class='icon xr-icon-file-text2'><use xlink:href='#icon-file-text2'></use></svg></label><input id='data-9c601cca-c498-4db4-ae98-adaea6c702fa' class='xr-var-data-in' type='checkbox'><label for='data-9c601cca-c498-4db4-ae98-adaea6c702fa' title='Show/Hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-var-attrs'><dl class='xr-attrs'><dt><span>units :</span></dt><dd>meters</dd><dt><span>longname :</span></dt><dd>meters_above_sea</dd></dl></div><div class='xr-var-data'><pre>[134 values with dtype=&lt;U3]</pre></div></li><li class='xr-var-item'><div class='xr-var-name'><span>latitude</span></div><div class='xr-var-dims'>(id)</div><div class='xr-var-dtype'>float64</div><div class='xr-var-preview xr-preview'>...</div><input id='attrs-2c085044-3ce0-4e28-99df-7356a723be15' class='xr-var-attrs-in' type='checkbox' ><label for='attrs-2c085044-3ce0-4e28-99df-7356a723be15' title='Show/Hide attributes'><svg class='icon xr-icon-file-text2'><use xlink:href='#icon-file-text2'></use></svg></label><input id='data-4ddc701f-5712-4c88-92d3-ff02090150ed' class='xr-var-data-in' type='checkbox'><label for='data-4ddc701f-5712-4c88-92d3-ff02090150ed' title='Show/Hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-var-attrs'><dl class='xr-attrs'><dt><span>units :</span></dt><dd>degrees in WGS84 projection</dd></dl></div><div class='xr-var-data'><pre>[134 values with dtype=float64]</pre></div></li><li class='xr-var-item'><div class='xr-var-name'><span>longitude</span></div><div class='xr-var-dims'>(id)</div><div class='xr-var-dtype'>float64</div><div class='xr-var-preview xr-preview'>...</div><input id='attrs-a031ce10-e627-4213-b20e-07be8f56f3d3' class='xr-var-attrs-in' type='checkbox' ><label for='attrs-a031ce10-e627-4213-b20e-07be8f56f3d3' title='Show/Hide attributes'><svg class='icon xr-icon-file-text2'><use xlink:href='#icon-file-text2'></use></svg></label><input id='data-76ebac25-2a29-4e60-8b18-ebf9d4ddae4c' class='xr-var-data-in' type='checkbox'><label for='data-76ebac25-2a29-4e60-8b18-ebf9d4ddae4c' title='Show/Hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-var-attrs'><dl class='xr-attrs'><dt><span>units :</span></dt><dd>degrees in WGS84 projection</dd></dl></div><div class='xr-var-data'><pre>[134 values with dtype=float64]</pre></div></li></ul></div></li><li class='xr-section-item'><input id='section-8e7c70c5-5988-4458-b0ce-8b45df4c71bf' class='xr-section-summary-in' type='checkbox'  checked><label for='section-8e7c70c5-5988-4458-b0ce-8b45df4c71bf' class='xr-section-summary' >Data variables: <span>(1)</span></label><div class='xr-section-inline-details'></div><div class='xr-section-details'><ul class='xr-var-list'><li class='xr-var-item'><div class='xr-var-name'><span>rainfall</span></div><div class='xr-var-dims'>(id, time)</div><div class='xr-var-dtype'>float64</div><div class='xr-var-preview xr-preview'>...</div><input id='attrs-31b802ec-6e10-4d99-82d0-018700670814' class='xr-var-attrs-in' type='checkbox' ><label for='attrs-31b802ec-6e10-4d99-82d0-018700670814' title='Show/Hide attributes'><svg class='icon xr-icon-file-text2'><use xlink:href='#icon-file-text2'></use></svg></label><input id='data-f242709c-6915-4352-8466-db0ea79e909c' class='xr-var-data-in' type='checkbox'><label for='data-f242709c-6915-4352-8466-db0ea79e909c' title='Show/Hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-var-attrs'><dl class='xr-attrs'><dt><span>name :</span></dt><dd>rainfall</dd><dt><span>long_name :</span></dt><dd>rainfall amount per time unit</dd><dt><span>units :</span></dt><dd>mm</dd><dt><span>coverage_contant_type :</span></dt><dd>physicalMeasurement</dd></dl></div><div class='xr-var-data'><pre>[29368512 values with dtype=float64]</pre></div></li></ul></div></li><li class='xr-section-item'><input id='section-b016269e-77d4-4a16-9fb7-ba9371118b78' class='xr-section-summary-in' type='checkbox'  ><label for='section-b016269e-77d4-4a16-9fb7-ba9371118b78' class='xr-section-summary' >Indexes: <span>(2)</span></label><div class='xr-section-inline-details'></div><div class='xr-section-details'><ul class='xr-var-list'><li class='xr-var-item'><div class='xr-index-name'><div>time</div></div><div class='xr-index-preview'>PandasIndex</div><div></div><input id='index-5dafb564-f441-4156-bbdf-979d5ea80a45' class='xr-index-data-in' type='checkbox'/><label for='index-5dafb564-f441-4156-bbdf-979d5ea80a45' title='Show/Hide index repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-index-data'><pre>PandasIndex(DatetimeIndex([&#x27;2016-05-01 00:05:00&#x27;, &#x27;2016-05-01 00:10:00&#x27;,\n",
       "               &#x27;2016-05-01 00:15:00&#x27;, &#x27;2016-05-01 00:20:00&#x27;,\n",
       "               &#x27;2016-05-01 00:25:00&#x27;, &#x27;2016-05-01 00:30:00&#x27;,\n",
       "               &#x27;2016-05-01 00:35:00&#x27;, &#x27;2016-05-01 00:40:00&#x27;,\n",
       "               &#x27;2016-05-01 00:45:00&#x27;, &#x27;2016-05-01 00:50:00&#x27;,\n",
       "               ...\n",
       "               &#x27;2018-05-31 23:15:00&#x27;, &#x27;2018-05-31 23:20:00&#x27;,\n",
       "               &#x27;2018-05-31 23:25:00&#x27;, &#x27;2018-05-31 23:30:00&#x27;,\n",
       "               &#x27;2018-05-31 23:35:00&#x27;, &#x27;2018-05-31 23:40:00&#x27;,\n",
       "               &#x27;2018-05-31 23:45:00&#x27;, &#x27;2018-05-31 23:50:00&#x27;,\n",
       "               &#x27;2018-05-31 23:55:00&#x27;, &#x27;2018-06-01 00:00:00&#x27;],\n",
       "              dtype=&#x27;datetime64[ns]&#x27;, name=&#x27;time&#x27;, length=219168, freq=None))</pre></div></li><li class='xr-var-item'><div class='xr-index-name'><div>id</div></div><div class='xr-index-preview'>PandasIndex</div><div></div><input id='index-e0132d19-55c2-403c-9eb2-6efd55d47f8b' class='xr-index-data-in' type='checkbox'/><label for='index-e0132d19-55c2-403c-9eb2-6efd55d47f8b' title='Show/Hide index repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-index-data'><pre>PandasIndex(Index([&#x27;ams1&#x27;, &#x27;ams2&#x27;, &#x27;ams3&#x27;, &#x27;ams4&#x27;, &#x27;ams5&#x27;, &#x27;ams6&#x27;, &#x27;ams7&#x27;, &#x27;ams8&#x27;, &#x27;ams9&#x27;,\n",
       "       &#x27;ams10&#x27;,\n",
       "       ...\n",
       "       &#x27;ams125&#x27;, &#x27;ams126&#x27;, &#x27;ams127&#x27;, &#x27;ams128&#x27;, &#x27;ams129&#x27;, &#x27;ams130&#x27;, &#x27;ams131&#x27;,\n",
       "       &#x27;ams132&#x27;, &#x27;ams133&#x27;, &#x27;ams134&#x27;],\n",
       "      dtype=&#x27;object&#x27;, name=&#x27;id&#x27;, length=134))</pre></div></li></ul></div></li><li class='xr-section-item'><input id='section-5d28fede-2f5d-4665-8bc8-f37b968ddb11' class='xr-section-summary-in' type='checkbox'  ><label for='section-5d28fede-2f5d-4665-8bc8-f37b968ddb11' class='xr-section-summary' >Attributes: <span>(10)</span></label><div class='xr-section-inline-details'></div><div class='xr-section-details'><dl class='xr-attrs'><dt><span>title :</span></dt><dd>PWS data from Amsterdam</dd><dt><span>file author :</span></dt><dd>Maximilian Graf</dd><dt><span>institution :</span></dt><dd>Wageningen University and Research, Department of Environmental Sciences</dd><dt><span>date :</span></dt><dd>2022-10-18 10:32:00</dd><dt><span>source :</span></dt><dd>Netamo PWS</dd><dt><span>history :</span></dt><dd>Data derived and reformated from the originally published dataset</dd><dt><span>naming convention :</span></dt><dd>OpenSense-0.1</dd><dt><span>license restrictions :</span></dt><dd>CC-BY 4.0 https://creativecommons.org/licenses/by/4.0/</dd><dt><span>reference :</span></dt><dd>https://doi.org/10.1029/2019GL083731</dd><dt><span>comment :</span></dt><dd></dd></dl></div></li></ul></div></div>"
      ],
      "text/plain": [
       "<xarray.Dataset>\n",
       "Dimensions:    (time: 219168, id: 134)\n",
       "Coordinates:\n",
       "  * time       (time) datetime64[ns] 2016-05-01T00:05:00 ... 2018-06-01\n",
       "  * id         (id) <U6 'ams1' 'ams2' 'ams3' ... 'ams132' 'ams133' 'ams134'\n",
       "    elevation  (id) <U3 ...\n",
       "    latitude   (id) float64 ...\n",
       "    longitude  (id) float64 ...\n",
       "Data variables:\n",
       "    rainfall   (id, time) float64 ...\n",
       "Attributes:\n",
       "    title:                 PWS data from Amsterdam\n",
       "    file author:           Maximilian Graf\n",
       "    institution:           Wageningen University and Research, Department of ...\n",
       "    date:                  2022-10-18 10:32:00\n",
       "    source:                Netamo PWS\n",
       "    history:               Data derived and reformated from the originally pu...\n",
       "    naming convention:     OpenSense-0.1\n",
       "    license restrictions:  CC-BY 4.0 https://creativecommons.org/licenses/by/...\n",
       "    reference:             https://doi.org/10.1029/2019GL083731\n",
       "    comment:               "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ds_pws = xr.open_dataset(\"OpenSense_PWS_example_format_data.nc\")\n",
    "ds_pws"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e154e21-f280-4fed-9512-e5f9d01c813f",
   "metadata": {},
   "source": [
    "### Reproject coordinates "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36741703-1d9a-4e5c-8fc4-e8f3d7019247",
   "metadata": {},
   "source": [
    "First we reproject the coordinates to a local metric coordinate reference system to allow for distance calculations. In the Amsterdam example we use EPSG:25832. **Remember to use a local metric reference system for your use case!** We use the function `spatial.project_point_coordinates` in the `poligrain`package. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "cf503ba3-25b1-431a-a4cc-ed1ef75b1a8a",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds_pws.coords[\"x\"], ds_pws.coords[\"y\"] = plg.spatial.project_point_coordinates(\n",
    "    x=ds_pws.longitude, y=ds_pws.latitude, target_projection=\"EPSG:25832\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b3cb873-e151-4fe4-8391-48479cf0179a",
   "metadata": {},
   "source": [
    "### Create distance matrix\n",
    "\n",
    "Then, we calculate the distances between all stations in our data set. If your data set has a large number of stations this can take some time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "0cb2270c-e3e8-43fe-9df0-8b88611c0c55",
   "metadata": {},
   "outputs": [],
   "source": [
    "distance_matrix = plg.spatial.calc_point_to_point_distances(ds_pws, ds_pws)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38d25c8d-47bc-4f3c-82f0-9749a0bf593f",
   "metadata": {},
   "source": [
    "### Calculate data variables "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8514bec-544f-4c21-ab49-83ef5d6aa10b",
   "metadata": {},
   "source": [
    "Next, we will calculate the data variables `nbrs_not_nan` and `reference` that are needed to perform the quality control. \n",
    "\n",
    "`nbrs_not_nan`:\n",
    "Number of neighbours within a specificed range `max_distance` around the station that are reporting rainfall for each time step. The selected range depends on the use case and area of interest. In this example we use 10'000 meters. \n",
    "\n",
    " `reference`:\n",
    "Median rainfall of all stations within range `max_distance` from each station.\n",
    "\n",
    "`max_distance` is called `d` in the original publication."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c375fd9b-e3d6-4c5a-ace6-e0a4339dd239",
   "metadata": {},
   "source": [
    "#### Select considered range around each station"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "ad8fb97b-976d-4011-a9ec-c03a42d89f64",
   "metadata": {},
   "outputs": [],
   "source": [
    "max_distance = 10e3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "4e122f83-40dd-47d7-81f9-7d772efb93e7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 23 s, sys: 11.8 s, total: 34.8 s\n",
      "Wall time: 36.6 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "ds_pws = ds_pws.load()\n",
    "\n",
    "nbrs_not_nan = []\n",
    "reference = []\n",
    "\n",
    "for pws_id in ds_pws.id.data:\n",
    "    neighbor_ids = distance_matrix.id.data[\n",
    "        (distance_matrix.sel(id=pws_id) < max_distance)\n",
    "        & (distance_matrix.sel(id=pws_id) > 0)\n",
    "    ]\n",
    "\n",
    "    N = ds_pws.rainfall.sel(id=neighbor_ids).notnull().sum(dim=\"id\")  # noqa: PD004\n",
    "    nbrs_not_nan.append(N)\n",
    "\n",
    "    median = ds_pws.sel(id=neighbor_ids).rainfall.median(dim=\"id\")\n",
    "    reference.append(median)\n",
    "\n",
    "ds_pws[\"nbrs_not_nan\"] = xr.concat(nbrs_not_nan, dim=\"id\")\n",
    "ds_pws[\"reference\"] = xr.concat(reference, dim=\"id\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "42ccbafe-b798-49a4-b637-6ee99062e2a3",
   "metadata": {},
   "source": [
    "## Quality control"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a35b4f4d-a945-49f6-9fb5-d67406ca79b3",
   "metadata": {},
   "source": [
    "Now the data set is prepared to run the quality control."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c50cc8e6-1eb5-4d1f-af37-905f34bff5cb",
   "metadata": {},
   "source": [
    "### Faulty Zeros filter"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6727aaa2-ad5b-4ab7-8d32-1e11a6fc0e57",
   "metadata": {},
   "source": [
    "Conditions for raising Faulty Zeros flag:\n",
    "\n",
    "* Median rainfall of neighbouring stations within range max_distance is larger than zero for at least nint time intervals while the station itself reports zero rainfall.\n",
    "* The FZ flag remains 1 until the station reports nonzero rainfall.\n",
    "* Filter cannot be applied if less than `n_stat` neighbours are reporting data (FZ flag is set to -1)\n",
    "* NOTE! The filter cannot be applied if the station has reported NaN data in the last `nint` time steps. This gives more -1 flags than in the original R-implementation that does not use this condition. This choice was done to ensure that timesteps without data at the evaluated station is not mistakenly being interpreted as timesteps who have passed the quality control (if they would have been flagged with 0) or as time steps with a Faulty Zero issue (if they would have been flagged with 1).\n",
    "  \n",
    "For settings for parameter `nint` and `n_stat`, see table 1 in https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019GL083731"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6c99d15-684d-4f06-8c09-a2d68f1e3cf4",
   "metadata": {},
   "source": [
    "#### Run filter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "34ddb1f4-2c7d-4310-8bc7-f3aaeb8926f8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 3min 6s, sys: 3.6 s, total: 3min 10s\n",
      "Wall time: 3min 17s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "ds_pws_filtered = pypwsqc.flagging.fz_filter(ds_pws, nint=6, n_stat=5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a837ada-1ed9-42e5-bfde-f3f6fbf1e2e4",
   "metadata": {},
   "source": [
    "### High Influx filter"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6adf19c-bb62-4f54-8850-388aaab3dd28",
   "metadata": {},
   "source": [
    "Conditions for raising High Influx flag:\n",
    "\n",
    "* If median rainfall of neighbours is below threshold ϕA, then high influx if rainfall above threshold ϕB\n",
    "* If median rainfall of neighbours is above ϕA, then high influx if rainfall exceeds median times ϕB/ϕA\n",
    "* Filter cannot be applied if less than n_stat neighbours are reporting data (HI flag is set to -1)\n",
    "* NOTE! The filter cannot be applied if the station has reported NaN data in the last `nint` time steps. This gives more -1 flags than in the original R-implementation that does not use this condition. This choice was done to ensure that timesteps without data at the evaluated station is not mistakenly being interpreted as timesteps who have passed the quality control (if they would have been flagged with 0) or as time steps with a High Influx issue (if they would have been flagged with 1).\n",
    "  \n",
    "For settings for parameter ϕA, ϕB and n_stat, see table 1 in https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019GL083731"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1b83e12-fb54-4057-a3f7-d5a9ffc4cf3e",
   "metadata": {},
   "source": [
    "#### Run filter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "697a6a82-2081-475c-81ad-4982cef90533",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 1.63 s, sys: 764 ms, total: 2.39 s\n",
      "Wall time: 2.5 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "ds_pws_filtered = pypwsqc.flagging.hi_filter(\n",
    "    ds_pws, hi_thres_a=0.4, hi_thres_b=10, nint=6, n_stat=5\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3734095c-ca51-45cd-9a8c-6626ab837082",
   "metadata": {},
   "source": [
    "### Station Outlier filter"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d308283d-ff36-4016-877c-91543443b30b",
   "metadata": {},
   "source": [
    "Conditions for raising Station Outlier flag:\n",
    "\n",
    "* Median of the rolling pearson correlation with all neighboring stations within range `max_distance` is less than threshold `gamma`\n",
    "* Filter cannot be applied if less than `n_stat` neighbours are reporting data (SO flag is set to -1)\n",
    "*  Filter cannot be applied if there are less than `n_stat` neighbours with less than `mmatch` intervals overlapping with the evaluated station (SO flag is set to -1)\n",
    "\n",
    "For settings for parameter `evaluation_period`, `mmatch`, `gamma`, and `n_stat`, see table 1 in https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019GL083731 \n",
    "\n",
    "Note! The SO-filter is different compared with the original R-code. In its original implementation, any interval with at least `mrain` intervals of nonzero rainfall measurements is evaluated. In this implementation, only a fixed rolling time window is evaluated. Therefore, the `mrain` variable from the orignal code is not needed. In the original publication, the variable `evaluation_period` (the evaluation period) is set to 4032. For 5-minute data, this is equivalent of two weeks. When the option of a variable evaluation period is excluded, two weeks is often too short as there might not be enough wet periods in the last two weeks to calculate the correlation. This results in a lot of '-1'-flags (filter cannot be applied). It is suggested to use a longer evaluation period, for example four weeks (`evaluation_period` = 8064 for 5-minute data).\n",
    "\n",
    "The first `evaluation_period` timesteps (here set to 8064 time steps), the rollig median correlation is computed with the last time steps in the time series. Therefore, the resulting `median_corr_nbrs` should be disregarded the first `evaluation_period` time steps.\n",
    "\n",
    "`evaluation_period` is called `mint`in the original publication."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2a1b9843-2115-44a6-a83b-452df4908cd8",
   "metadata": {},
   "source": [
    "We initialize data variables for the resulting SO-flags and the median pearson correlation with neighboring stations with the value -999. If the variables have the value 0 (passed the test), 1 (did not pass the test) or -1 (not enough information) after running the SO-filter, we know that these time series have been evaluated. If the value is still -999, this means that something went wrong as the data has not been processed. \n",
    "\n",
    "We also save the threshold `gamma` as a variable. In this way we can easily visualize if the median correlation with neighbors drops below this threshold, which is the condition for raising a SO-flag."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "aeda5019-ca1c-4e30-befc-46c10604be03",
   "metadata": {},
   "outputs": [],
   "source": [
    "evaluation_period = 8064\n",
    "mmatch = 200\n",
    "gamma = 0.15\n",
    "n_stat = 5\n",
    "max_distance = 10e3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "b737e56e-bac6-474e-b8bd-33f9e14e9f21",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds_pws[\"so_flag\"] = xr.DataArray(\n",
    "    np.ones((len(ds_pws.id), len(ds_pws.time))) * -999, dims=(\"id\", \"time\")\n",
    ")\n",
    "ds_pws[\"median_corr_nbrs\"] = xr.DataArray(\n",
    "    np.ones((len(ds_pws.id), len(ds_pws.time))) * -999, dims=(\"id\", \"time\")\n",
    ")\n",
    "ds_pws[\"gamma\"] = xr.DataArray(\n",
    "    np.ones((len(ds_pws.id), len(ds_pws.time))) * gamma, dims=(\"id\", \"time\")\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6dd02e85-676d-4e55-b81d-4e38bb0948b6",
   "metadata": {},
   "source": [
    "#### Run filter\n",
    "test timing for one month"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "366e2c70-68cc-45c3-b0d1-e7acbd4dbcb6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 40.2 s, sys: 8.81 s, total: 49 s\n",
      "Wall time: 50.1 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "ds_pws_filtered = pypwsqc.flagging.so_filter(\n",
    "    ds_pws.sel(time=\"2018-05\"),\n",
    "    distance_matrix,\n",
    "    evaluation_period,\n",
    "    mmatch,\n",
    "    gamma,\n",
    "    n_stat,\n",
    "    max_distance,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc8b8cac-e9e1-4d61-99d5-2f2d3f2dbc29",
   "metadata": {},
   "source": [
    "### Save filtered data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "41f502d1-6d87-43c1-a0c9-76f4c737147f",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds_pws_filtered.to_netcdf(\"filtered_dataset.nc\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}