(Translated by https://www.hiragana.jp/)
♟ Milimetric
Page MenuHomePhabricator

Milimetric (Dan Andreescu)
Staff Engineer (Data Engineering)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Oct 8 2014, 5:48 PM (512 w, 5 d)
Availability
Available
IRC Nick
Milimetric
LDAP User
Milimetric
MediaWiki User
Milimetric (WMF) [ Global Accounts ]

Recent Activity

Wed, Jul 31

Milimetric added a project to T217792: Add wikitech (labswiki) to the sqoop list: Data Products (Data Products Sprint 17).

@VirginiaPoundstone adding this to our sprint because it's basically a no-op and an easy resolution to an old task.

Wed, Jul 31, 6:31 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Data-Engineering

Tue, Jul 30

Milimetric moved T371031: Spike: Deep Dive on Growthbook data pipeline from In Process to Code Review / Tech Input on the Data Products (Data Products Sprint 17) board.
Tue, Jul 30, 8:00 PM · Data Products (Data Products Sprint 17)
Milimetric updated the task description for T371031: Spike: Deep Dive on Growthbook data pipeline.
Tue, Jul 30, 8:00 PM · Data Products (Data Products Sprint 17)
Milimetric updated the task description for T337562: Decide how to split wmf database into functional areas.
Tue, Jul 30, 7:42 PM · Data Pipelines (Sprint 14)
Milimetric added a comment to T217792: Add wikitech (labswiki) to the sqoop list.

Oh! Thanks for the reminder, this is now available but not included in the sqoop lists. I'll make a patch, easy enough. labswiki seems to be available in both the analytics replicas and cloud replicas (as labswiki_p in the latter)

Tue, Jul 30, 2:08 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Data-Engineering

Mon, Jul 29

Izno awarded T342267: Investigate surprising "10% Other" portion of Analytics Browsers report a Like token.
Mon, Jul 29, 10:06 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki
Milimetric added a comment to T371031: Spike: Deep Dive on Growthbook data pipeline.

Some initial thoughts: https://docs.google.com/document/d/1NorKzBiQyz2nXCUUkGqdFP8SfPD0bzBMdfPO183dqNY/edit

Mon, Jul 29, 9:21 PM · Data Products (Data Products Sprint 17)
Milimetric added a project to T371099: No longer use removed cuc_actiontext column in analytics/refinery: Data Products (Data Products Sprint 17).

Adding this to our Sprint board as it needs to get a look and deployment. But the change looks good, just have to alter the table and make sure the timing of all the jobs works out.

Mon, Jul 29, 8:18 PM · Data Products (Data Products Sprint 17), Trust and Safety Product Sprint (Sprint Koto (July 15 - July 26)), Data-Engineering, Patch-For-Review
Milimetric merged T371319: Update sqoop code to remove cuc_actiontext from query and table into T371099: No longer use removed cuc_actiontext column in analytics/refinery.
Mon, Jul 29, 8:17 PM · Data Products (Data Products Sprint 17), Trust and Safety Product Sprint (Sprint Koto (July 15 - July 26)), Data-Engineering, Patch-For-Review
Milimetric merged task T371319: Update sqoop code to remove cuc_actiontext from query and table into T371099: No longer use removed cuc_actiontext column in analytics/refinery.
Mon, Jul 29, 8:16 PM · CheckUser, DBA, Data-Engineering, Schema-change-in-production, Data Products
Milimetric placed T371319: Update sqoop code to remove cuc_actiontext from query and table up for grabs.
Mon, Jul 29, 8:05 PM · CheckUser, DBA, Data-Engineering, Schema-change-in-production, Data Products
Milimetric created T371319: Update sqoop code to remove cuc_actiontext from query and table.
Mon, Jul 29, 8:01 PM · CheckUser, DBA, Data-Engineering, Schema-change-in-production, Data Products
Milimetric moved T368253: MetricsPlatform: Add performance instrumentation from Code Review / Tech Input to In Process on the Data Products (Data Products Sprint 17) board.
Mon, Jul 29, 4:10 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Metrics Platform Backlog
Milimetric moved T342267: Investigate surprising "10% Other" portion of Analytics Browsers report from Sprint Backlog to To Deploy on the Data Products (Data Products Sprint 17) board.
Mon, Jul 29, 4:07 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki
Milimetric edited projects for T342267: Investigate surprising "10% Other" portion of Analytics Browsers report, added: Data Products (Data Products Sprint 17); removed Data Products (Data Products Sprint 16).
Mon, Jul 29, 3:24 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki
Milimetric moved T342267: Investigate surprising "10% Other" portion of Analytics Browsers report from Sign Off to To Deploy on the Data Products (Data Products Sprint 16) board.

great, moving this to get deployed. Steps will be:

Mon, Jul 29, 3:22 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki
Milimetric added a comment to T352650: Migrate current-generation dumps to run from our containerized images.

Just for the record, we met and discussed @Joe's proposal (this task's description) and were in general agreement that it's the best way forward. We have follow-up discussions to have and coordination to do, but we're aligned on the idea.

Mon, Jul 29, 3:04 PM · Data Products, Data-Platform-SRE, MW-on-K8s, Dumps-Generation, Release-Engineering-Team, serviceops

Sat, Jul 27

Milimetric added a comment to T364045: [Bug?] Can't find wikidatawiki on wmf.mediawiki_wikitext_history.

K, as a final update here, the pipeline is:

Sat, Jul 27, 12:37 PM · Wikidata, Wikidata Analytics, Data-Engineering (Q4 2024 April 1st - June 30th)

Fri, Jul 26

Milimetric added a comment to T364045: [Bug?] Can't find wikidatawiki on wmf.mediawiki_wikitext_history.

manually running this in a screen on an-launcher1002:

Fri, Jul 26, 6:30 PM · Wikidata, Wikidata Analytics, Data-Engineering (Q4 2024 April 1st - June 30th)
Milimetric added a comment to T364045: [Bug?] Can't find wikidatawiki on wmf.mediawiki_wikitext_history.

Ok, this ended up being very involved. I believe the root of all the confusion is that all the dumps jobs assume the PREVIOUS dump finished and work only on the CURRENT dump. So we ran around dumpsdata and snapshot hosts, hardcoding 20240701 where it was looking for "latest" and we're not sure whether we broke anything. At the end of the day, we basically figured that the snapshot1010 version of dumps files seemed all good, and we just rsynced them over to dumpsdata and clouddumps hosts. The rsync service that runs ALSO assumes this "latest" thing, but not for all files, just for the status files. So as far as we could tell everything was already rsynced except the status and html files. The monitor/html generation service ALSO assumes this "latest" thing so we weren't able to run that to generate the html, even after trying to hack it, but the html files were already on the snapshot hosts so we just moved those over with rsync too. The base rsync excludes json and html, so we just hacked it to include them instead.

Fri, Jul 26, 6:21 PM · Wikidata, Wikidata Analytics, Data-Engineering (Q4 2024 April 1st - June 30th)
Milimetric added a comment to T369868: Improve handling of delete, restore, and merge from incremental update.

Quick spike to size up how many revisions we're dealing with on a daily basis:

Fri, Jul 26, 1:52 PM · Dumps 2.0 (Kanban Board)

Thu, Jul 25

Milimetric claimed T371031: Spike: Deep Dive on Growthbook data pipeline.
Thu, Jul 25, 2:46 PM · Data Products (Data Products Sprint 17)
Milimetric created T371031: Spike: Deep Dive on Growthbook data pipeline.
Thu, Jul 25, 2:45 PM · Data Products (Data Products Sprint 17)
Milimetric added a comment to T364045: [Bug?] Can't find wikidatawiki on wmf.mediawiki_wikitext_history.

oof, I just realized this is for the month BEFORE. I see that's still in-process:

Thu, Jul 25, 2:06 PM · Wikidata, Wikidata Analytics, Data-Engineering (Q4 2024 April 1st - June 30th)

Wed, Jul 24

Milimetric created T370948: Spike: MediaWiki db schema and reconciliation.
Wed, Jul 24, 6:53 PM · Dumps 2.0 (Kanban Board)

Tue, Jul 23

Milimetric added a comment to T364045: [Bug?] Can't find wikidatawiki on wmf.mediawiki_wikitext_history.

Ok, dug into this a bit more. Looks like the job set up to import the dumps XML is running fine but the status file says wikidatawiki is still in progress. Specifically it says this:

Tue, Jul 23, 9:27 PM · Wikidata, Wikidata Analytics, Data-Engineering (Q4 2024 April 1st - June 30th)
Milimetric added a comment to T364045: [Bug?] Can't find wikidatawiki on wmf.mediawiki_wikitext_history.

The airflow sensor timed out. But I never saw an alert for it (maybe it was before this week). I cleared it and will report back here in a bit after it has a chance to think about running again.

Tue, Jul 23, 7:18 PM · Wikidata, Wikidata Analytics, Data-Engineering (Q4 2024 April 1st - June 30th)
Milimetric moved T362783: Add instrumentation for actor signatures from Ready to Deploy to Done on the Data-Engineering (Q1 2024 July 1st - September 30th) board.

Deployed, started job, waiting to see if it works.

Tue, Jul 23, 7:06 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Patch-For-Review
Milimetric moved T362785: Add host level instrumentation on webrequest from Ready to Deploy to Done on the Data-Engineering (Q1 2024 July 1st - September 30th) board.

I deployed this and started the job, checking in now to make sure it runs.

Tue, Jul 23, 7:05 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Patch-For-Review
Milimetric changed the visibility for F56356662: running_reconciliation_queries.py.
Tue, Jul 23, 5:42 PM

Mon, Jul 22

Milimetric added a comment to T369851: NEW BUG REPORT Mediawiki_history contains duplicate rows for some revisions.
  • wmf.mediawiki_history: duplicate revision/create records indeed exist, some have 4 copies and some 2 copies but all spot-checked duplicates come in even numbers
  • wmf_raw.mediawiki_revision: does not show the same duplication
  • analytics mysql replicas: the pages those revisions belong to were moved and had some delete/restore and delete/revision actions in the logging table
  • cloud replicas: agrees with analytics replicas
Mon, Jul 22, 9:59 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Movement-Insights, Analytics-Data-Problem, Data-Platform
Milimetric added a comment to T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate.

+1 to decom

Mon, Jul 22, 4:05 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Patch-For-Review, MW-1.43-notes (1.43.0-wmf.8; 2024-06-04), MediaWiki-Platform-Team (Radar), Event-Platform, MediaWiki-General
Milimetric added a comment to T370394: Drop gb_by from globalblocks table.

We do not currently use globalblocks anywhere I know of or searched.

Mon, Jul 22, 3:53 PM · Data-Engineering, Schema-change-in-production, DBA
Milimetric added a comment to T352650: Migrate current-generation dumps to run from our containerized images.

Sorry to ask this very basic question, but I found a bunch of others didn't know: how exactly is Dumps blocking the php 8 upgrade? Like, if we leave everything exactly as-is and just upgrade PHP, would it not run the way it is currently set up? On surface I see no big difference between the current setup and a containerized MW running on the same servers, so I'm curious about the nuance I'm missing here.

Mon, Jul 22, 12:50 PM · Data Products, Data-Platform-SRE, MW-on-K8s, Dumps-Generation, Release-Engineering-Team, serviceops
Milimetric added a comment to T365074: Requesting access to cassandra-staging-devs for milimetric.

Sorry, I just signed it, I'm sure I signed it or some form of it at some point before, I've been an employee for like 12 years almost :P

Mon, Jul 22, 12:42 PM · SRE, SRE-Access-Requests

Fri, Jul 19

Milimetric created T370551: Bug: Cassandra Unique Devices not loading Wikifunctions mobile data.
Fri, Jul 19, 7:28 PM · Data-Platform
Milimetric moved T342267: Investigate surprising "10% Other" portion of Analytics Browsers report from Code Review / Tech Input to To Deploy on the Data Products (Data Products Sprint 16) board.

ok, moving to ready to deploy. I'm going to ping @Krinkle one more time for data review. I executed this as I was testing and the results are available in milimetric.browser_general_test. You can query this like this:

Fri, Jul 19, 4:29 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki

Wed, Jul 17

Milimetric added a subtask for T369898: Reduce the number of resource_change and resource_purge events emitted due to template changes: T370298: Spike: estimate how much of resource_purge could be filtered out by joining to webrequest.
Wed, Jul 17, 3:15 PM · Essential-Work, MW-1.43-notes (1.43.0-wmf.16; 2024-07-30), serviceops, Performance Issue, MediaWiki-Engineering, MediaWiki-Core-HTTP-Cache, ChangeProp
Milimetric added a parent task for T370298: Spike: estimate how much of resource_purge could be filtered out by joining to webrequest: T369898: Reduce the number of resource_change and resource_purge events emitted due to template changes.
Wed, Jul 17, 3:14 PM · MediaWiki-Engineering
Milimetric created T370298: Spike: estimate how much of resource_purge could be filtered out by joining to webrequest.
Wed, Jul 17, 3:14 PM · MediaWiki-Engineering

Tue, Jul 16

Milimetric added a comment to T370108: Missed pageview data over API.

weird... quick steps as I look into this.

Tue, Jul 16, 10:14 PM · Analytics-Data-Problem, Data Products, Pageviews-API, Data-Engineering

Fri, Jul 12

Milimetric added a comment to T342267: Investigate surprising "10% Other" portion of Analytics Browsers report.

Ok, sent updated code, it's fast now due to a CACHE statement, but that doesn't change the query plan which is still absolutely nuts, check this out:

Fri, Jul 12, 2:46 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki
Milimetric added a comment to T365487: Progress: Build a data visualization tool for the evolution of Wikipedia articles maintained by WikiProjects.

Quick spark-sql query to get link changes where someone tags a new wiki project on the talk page:

Fri, Jul 12, 2:32 PM · Outreachy (Round 28), Outreach-Programs-Projects

Thu, Jul 11

Milimetric moved T369868: Improve handling of delete, restore, and merge from incremental update from Sprint Backlog to In Process on the Dumps 2.0 (Kanban Board) board.
Thu, Jul 11, 8:36 PM · Dumps 2.0 (Kanban Board)
Milimetric created T369868: Improve handling of delete, restore, and merge from incremental update.
Thu, Jul 11, 8:36 PM · Dumps 2.0 (Kanban Board)
Milimetric moved T368176: [Dumps 2] Spike: Figure root causes of missing rows when doing reconciliation from Sprint Backlog to Done on the Dumps 2.0 (Kanban Board) board.

OK, so it seems most problems do indeed track back to not applying delete and restore events. It feels like we can mark this task complete. We can find a way to apply delete/restore/merge, and then run these queries again and see what we need to reconcile. The period I looked at above was 10 days of enwiki revisions. If anyone disagrees, do move this task back.

Thu, Jul 11, 5:01 PM · Event-Platform, Data-Engineering, Dumps 2.0 (Kanban Board)
Milimetric added a comment to T368176: [Dumps 2] Spike: Figure root causes of missing rows when doing reconciliation.

That one mismatch_page that has no other reason listed is apparently part of a merge, so if we're not following up on delete/restore properly then this makes perfect sense because merges are more complicated still. Here are the two pages involved and the logging table records for them:

Thu, Jul 11, 3:39 PM · Event-Platform, Data-Engineering, Dumps 2.0 (Kanban Board)
Milimetric added a comment to T368176: [Dumps 2] Spike: Figure root causes of missing rows when doing reconciliation.

Ok, I think I got this query to make sense... the results:

Thu, Jul 11, 3:13 PM · Event-Platform, Data-Engineering, Dumps 2.0 (Kanban Board)

Wed, Jul 10

Milimetric added a comment to T368176: [Dumps 2] Spike: Figure root causes of missing rows when doing reconciliation.

I am still trying to find an elegant way to change the queries and show all this, but I just wanted to share results so far:

Wed, Jul 10, 8:14 PM · Event-Platform, Data-Engineering, Dumps 2.0 (Kanban Board)
Milimetric updated the task description for T368303: REQUEST: Add Special:AllEvents to allowlist for campaigns-product pageview tracking.
Wed, Jul 10, 5:07 PM · Data Products (Data Products Sprint 17), Event-Discovery, Data-Platform
Milimetric added a comment to T342267: Investigate surprising "10% Other" portion of Analytics Browsers report.

Apologies for the week delay here, I was out sick, picking it back up soon.

Wed, Jul 10, 4:11 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki

Tue, Jul 9

Milimetric updated the task description for T368782: MediaWiki Reconciliation API.
Tue, Jul 9, 4:10 PM · Data-Engineering (Q1 2024 July 1st - September 30th), Dumps 2.0 (Kanban Board)

Jul 3 2024

Milimetric set the point value for T342267: Investigate surprising "10% Other" portion of Analytics Browsers report to 13.
Jul 3 2024, 7:51 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki
Milimetric added a comment to T358373: [Dumps 2] Reconcillation mechanism to detect and fetch missing/mismatched revisions.

Quick summary of last meeting. Luke started working on a draft of what we were talking about (see the reconciliation flow on https://miro.com/app/board/uXjVNfaohl0=/).

Jul 3 2024, 7:00 PM · Patch-For-Review, Dumps 2.0 (Kanban Board)

Jul 1 2024

Milimetric added a comment to T368176: [Dumps 2] Spike: Figure root causes of missing rows when doing reconciliation.

My first hunch, that the revisions were coming from only specific pages, is wrong:

Jul 1 2024, 3:23 PM · Event-Platform, Data-Engineering, Dumps 2.0 (Kanban Board)
Milimetric claimed T368176: [Dumps 2] Spike: Figure root causes of missing rows when doing reconciliation.
Jul 1 2024, 1:45 PM · Event-Platform, Data-Engineering, Dumps 2.0 (Kanban Board)

Jun 26 2024

Milimetric added a project to T368405: Special:Homepage is rendered much slower (<1 sec to 2+ sec): Data Products.
Jun 26 2024, 5:43 PM · Data-Platform-SRE (2024.07.08 - 2024.07.28), Growth-Team (FY2024-25 Q1 Sprint 1), MW-1.43-notes (1.43.0-wmf.11; 2024-06-25), Data Products, User-Michael, Data-Platform, Performance Issue, GrowthExperiments-Homepage
Milimetric added a comment to T365952: Special:Statistics disagrees with stats.wikimedia.org on the number of active users..

are we literally saying that we should just change the value of statistics-users-active to Editors? Code here: https://gerrit.wikimedia.org/g/mediawiki/core/+/1aa990f1725bf81caaf44527b9e778b5a8fe7e4d/languages/i18n/en.json#1950

Jun 26 2024, 4:07 PM · MediaWiki-Engineering, MediaWiki-Special-pages
Milimetric added a comment to T367781: Drop deprecated abuse filter fields on wmf wikis.

Thanks for pinging us, we don't use abuse filter tables anywhere I'm aware of, so this shouldn't affect us.

Jun 26 2024, 3:59 PM · Data-Engineering, Schema-change-in-production, DBA
Milimetric added a comment to T367856: Cleanup revision table schema.

Thanks for pinging us on this. The sqoop code should run without modification, so we're good downstream. Thank you!

Jun 26 2024, 3:57 PM · Schema-change-in-production, Data-Engineering, DBA, Data Products
Milimetric added a comment to T364548: [SPIKE] Design API for the standardised page lifecycle instrument mixin.

obligatory reference: https://www.mediawiki.org/wiki/Extension:NavigationTiming (is this roughly related?)

Jun 26 2024, 3:44 PM · Data Products, Patch-For-Review, Metrics Platform Backlog

Jun 25 2024

Milimetric claimed T366944: MPIC: Enable API to return sample rates per wiki.
Jun 25 2024, 5:43 PM · Data Products (Data Products Sprint 15), Metrics Platform Backlog
Milimetric closed T367526: Cloud VPS "dashiki" project Buster deprecation as Resolved.
Jun 25 2024, 12:59 PM · Cloud-VPS (Debian Buster Deprecation)
Milimetric added a comment to T367526: Cloud VPS "dashiki" project Buster deprecation.

This is now done.

Jun 25 2024, 12:59 PM · Cloud-VPS (Debian Buster Deprecation)
Milimetric updated the task description for T367526: Cloud VPS "dashiki" project Buster deprecation.
Jun 25 2024, 12:59 PM · Cloud-VPS (Debian Buster Deprecation)
Milimetric added a comment to T342267: Investigate surprising "10% Other" portion of Analytics Browsers report.

Great question, @mforns. This was mostly for performance reasons. I couldn't find a way to get Spark to optimally work on the full day of pageviews without first aggregating it like this to > 250. But the execution plan I ended up with looks pretty wild. Let's talk tomorrow when you have some time. I'm attaching the change here.

Jun 25 2024, 12:50 AM · Data Products (Data Products Sprint 17), Patch-For-Review, Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki

Jun 24 2024

Milimetric moved T368183: MPIC: Build Location + Sample Rates component from Sprint Backlog to In Process on the Data Products (Data Products Sprint 15) board.
Jun 24 2024, 4:15 PM · Metrics Platform Backlog, Data Products (Data Products Sprint 15)
Milimetric claimed T367526: Cloud VPS "dashiki" project Buster deprecation.

I've migrated and shut off the old instances. I will delete them in a couple of days, just in case. But everything's working fine without them. Did not know about the wmflabs -> wmcloud automatic redirect, that made everything very simple.

Jun 24 2024, 2:46 PM · Cloud-VPS (Debian Buster Deprecation)
Milimetric added a comment to T366004: Add page-title to the x_analytics header.

I grouped a couple of tasks under this so we're less likely to lose them in the fray.

Jun 24 2024, 1:52 PM · Data-Engineering
Milimetric added subtasks for T366004: Add page-title to the x_analytics header: T304362: Pageview definition relies on X-Analytics to determine special pages, T240676: Develop a consistent rule for which special pages count as pageviews.
Jun 24 2024, 1:49 PM · Data-Engineering
Milimetric added a parent task for T304362: Pageview definition relies on X-Analytics to determine special pages: T366004: Add page-title to the x_analytics header.
Jun 24 2024, 1:49 PM · Analytics-Data-Problem, Patch-Needs-Improvement, Data-Platform-SRE
Milimetric added a parent task for T240676: Develop a consistent rule for which special pages count as pageviews: T366004: Add page-title to the x_analytics header.
Jun 24 2024, 1:49 PM · Movement-Insights, Data-Engineering-Icebox, Campaign-Registration

Jun 21 2024

Milimetric added a comment to T342267: Investigate surprising "10% Other" portion of Analytics Browsers report.

The simpler way to do this, just two phases as opposed to progressive, gets us fairly similar results, with about 200 fewer rows which are all detailing specific browser versions.

Jun 21 2024, 8:12 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki
Milimetric added a comment to T342267: Investigate surprising "10% Other" portion of Analytics Browsers report.

We get a ton more detailed results this way, and the total coverage increases to 99.7%. Still not 99.9%, but I think we may have too much detail at some point. I'm fairly happy with these results, and I'm going to prepare the new browser general query as a gerrit change. It'll be good to get some review.

Jun 21 2024, 8:03 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki
Milimetric moved T368113: Design and merge the new tables of file tables from Incoming (new tickets) to To be estimated/discussed on the Data-Engineering board.

This might affect some data we sqoop into HDFS and some of how we compute commons impact metrics or similar future metrics. We have to wait until a schema change is proposed to know for sure.

Jun 21 2024, 6:38 PM · Data-Engineering, Data Products, Schema-change, DBA
Milimetric added a comment to T342267: Investigate surprising "10% Other" portion of Analytics Browsers report.

From a discussion with @Krinkle about the data, a preliminary idea of how to roll up is:

Jun 21 2024, 3:09 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki

Jun 20 2024

Milimetric moved T342267: Investigate surprising "10% Other" portion of Analytics Browsers report from Paused to Code Review / Tech Input on the Data Products (Data Products Sprint 15) board.
Jun 20 2024, 9:17 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki
Milimetric added a comment to T342267: Investigate surprising "10% Other" portion of Analytics Browsers report.

The long and the short of it is that we can get that "other" to about 2% if we simply roll up remaining data by browser family and os family. We could get fancier but let's see what folks think about just this approach.

Jun 20 2024, 9:17 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki
Milimetric added a comment to T342267: Investigate surprising "10% Other" portion of Analytics Browsers report.

This spreadsheet (1) has all the different aggregations in separate sheets. The name of the sheet is the aggregation type. Described here:

Jun 20 2024, 8:32 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki
Milimetric added a comment to T342267: Investigate surprising "10% Other" portion of Analytics Browsers report.

ok, I have some results for us to peruse, from rolling up in different ways. First of all, my query so we can debate whether or not it's accurate.

Jun 20 2024, 8:26 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki

Jun 17 2024

Milimetric claimed T367810: Spike: Can we recreate a skeleton page_change (revision_change) event from DB replica alone?.

Analysis available in this spreadsheet: https://docs.google.com/spreadsheets/d/1iSlH5XsRXV7mDoku0F5HbLNJmx1CMBm6ECakZMPUbU8/edit?usp=sharing

Jun 17 2024, 7:43 PM · Dumps 2.0 (Kanban Board)
Milimetric added a comment to T342267: Investigate surprising "10% Other" portion of Analytics Browsers report.

Made up some slides to help think about this data:

Jun 17 2024, 4:53 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki
Milimetric moved T342267: Investigate surprising "10% Other" portion of Analytics Browsers report from In Process to Paused on the Data Products (Data Products Sprint 15) board.
Jun 17 2024, 3:06 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki

Jun 12 2024

Milimetric claimed T342267: Investigate surprising "10% Other" portion of Analytics Browsers report.
Jun 12 2024, 2:02 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki
Milimetric moved T342267: Investigate surprising "10% Other" portion of Analytics Browsers report from Sprint Backlog to In Process on the Data Products (Data Products Sprint 15) board.
Jun 12 2024, 2:02 PM · Data Products (Data Products Sprint 17), Patch-For-Review, Analytics-Data-Problem, MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki

Jun 11 2024

Milimetric added a comment to T358373: [Dumps 2] Reconcillation mechanism to detect and fetch missing/mismatched revisions.

This looks like a great system to get started with. I can think of some potential snags that come up, so as we build it let's keep an eye out for these and similar:

Jun 11 2024, 7:01 PM · Patch-For-Review, Dumps 2.0 (Kanban Board)
Milimetric moved T366759: MPIC: Template form should update on post from In Process to Code Review / Tech Input on the Data Products (Data Products Sprint 14) board.
Jun 11 2024, 12:49 AM · Data Products (Data Products Sprint 15), Metrics Platform Backlog
Milimetric moved T366758: MPIC: Modify form should prepopulate on instrument select when toggling between functions from In Process to Code Review / Tech Input on the Data Products (Data Products Sprint 14) board.
Jun 11 2024, 12:49 AM · Data Products (Data Products Sprint 15), Metrics Platform Backlog

Jun 7 2024

Milimetric added a comment to T364872: Unique devices per country spikes on wikifunctions .
select day, http_status, count(*) count_by_status
  from pageview_actor
 where year=2024 and month=4 and day in (19,26)
   and geocoded_data['country_code'] = 'HK'
   and normalized_host.project_class = 'wikifunctions'
 group by day, http_status
dayhttp_statuscount_by_status
19200389
193014311
19302931
26200198
26301133801
263021028
Jun 7 2024, 6:29 PM · Abstract Wikipedia team, Movement-Insights, Analytics-Data-Problem, Data-Platform

Jun 6 2024

Milimetric created T366820: Wikistats Link with Language Option.
Jun 6 2024, 3:52 PM · Data Pipelines, Data-Engineering, Data Products, Data-Engineering-Wikistats

Jun 5 2024

Milimetric awarded T239378: Disable parent task metadata by default for new sub tasks a Like token.
Jun 5 2024, 8:56 PM · Patch-For-Review, User-brennen, Release-Engineering-Team, Phabricator, Developer Productivity
Milimetric updated the task description for T366720: Public DataHub.
Jun 5 2024, 4:33 PM · Data Products, Data-Engineering
Milimetric created T366720: Public DataHub.
Jun 5 2024, 4:12 PM · Data Products, Data-Engineering

Jun 4 2024

Milimetric moved T366604: [MPIC] Access to MPIC App within WMF systems from In Process to Done on the Data Products (Data Products Sprint 14) board.

https://mpic.svc.eqiad.wmnet:30443/ is the endpoint we should use to talk to the mpic service making sure we don't take unnecessary hops through DNS.

Jun 4 2024, 3:19 PM · Data-Platform-SRE (2024.06.17 - 2024.07.07), Data Products (Data Products Sprint 14)

May 31 2024

Milimetric added a comment to T366369: MaxMind seems to be mapping the same IP to different countries.

hypothesis so far: maybe some workers are getting MaxMind updates on a staggered schedule from others, so there's always some variation?

May 31 2024, 3:58 PM · Data-Engineering
Milimetric added a project to T366369: MaxMind seems to be mapping the same IP to different countries: Data-Engineering.

(and sub-country it's much worse)

May 31 2024, 3:57 PM · Data-Engineering
Milimetric created T366369: MaxMind seems to be mapping the same IP to different countries.
May 31 2024, 3:57 PM · Data-Engineering

May 29 2024

Milimetric moved T360914: Update Dashiki Cloud Instances from In Process to Sign Off on the Data Products (Data Products Sprint 14) board.

The two instances have been moved, docs updated on wiki and in code, and proxies have been moved. The only problem is the new proxies can't use the old wmflabs.org domain. For now, I left the old proxies up and additionally set up the new proxies. So, for example, both https://pingback.wmflabs.org/ and https://pingback.wmcloud.org/ work. Whenever the old instances are deleted, the old URLs will stop working. I guess part of sign-off will be to communicate this and maybe delete the old instances?

May 29 2024, 10:02 PM · Data-Engineering, Data-Engineering-Dashiki, Data Products (Data Products Sprint 14)
Milimetric added a comment to T360914: Update Dashiki Cloud Instances.

Keeping track of how I do this for future reference. (The previous task where I did this was T236586 and I failed to take good notes there)

May 29 2024, 9:30 PM · Data-Engineering, Data-Engineering-Dashiki, Data Products (Data Products Sprint 14)
Milimetric moved T360914: Update Dashiki Cloud Instances from Sprint Backlog to In Process on the Data Products (Data Products Sprint 14) board.
May 29 2024, 3:41 PM · Data-Engineering, Data-Engineering-Dashiki, Data Products (Data Products Sprint 14)