Friday, October 28, 2016

Usability improvements for Firefox automation initiative - Status update #8

NEW:
* Debugging on remote workers is not a main priority for this quarter since it got completed

* We've added investigating hyper chunking

On this update we will look at the progress made in the last two weeks.

This quarter’s main focus is on improving end to end times on Try (Thunder Try project).

For all bugs and priorities you can check out:

Thunder Try - Improve end to end times on try
---------------------------------------------

Project #1 - Artifact builds on automation
##########################################

Accomplished recently:
  • All --artifact build jobs now provide crash-reporter symbols. This means test jobs scheduled against linux64 debug work with --artifact.

Upcoming:
  • Patch in review - debug artifact builds on try. (Right now --artifact always results in an opt artifact build.)

Project #2 - S3 Cloud Compiler Cache
####################################

Accomplished recently:
  • Green builds on all platforms on Try, planning to compare build times vs. existing Python sccache now

Project #3 - Metrics
####################

Accomplished recently:

Upcoming:
  • Still has the “small population” problem, where outliers make the 90th percentile look big.
  • Add a TaskCluster view of End-to-End times
  • Start knocking off bugs in the dashboard

Project #4 - Build automation improvements
##########################################
Nothing new for this edition

Project #5 - Hyper chunking
###########################

Nothing new for this edition

Project #6 - Run Web platform tests from the source checkout
############################################################

Feature blocked on misconfigured EBS volumes in AWS (bug 1305174)


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Wednesday, October 12, 2016

Usability improvements for Firefox automation initiative - Status update #7

On this update we will look at the progress made in the last two weeks.

A reminder that this quarter’s main focus is on:
  • Debugging tests on interactive workers (only Linux on TaskCluster)
  • Improve end to end times on Try (Thunder Try project)

For all bugs and priorities you can check out the project management page for it:

Status update:
Debugging tests on interactive workers
---------------------------------------------------

Accomplished recently:
  • No new progress

Upcoming:
  • Android xpcshell
  • Blog/newsgroup post


Thunder Try - Improve end to end times on try
---------------------------------------------

Project #1 - Artifact builds on automation
##########################################

Accomplished recently:
  • The following platforms are now supported: linux, linux64, macosx64, win32, win64
  • An option was added to download symbols for our compiled artifacts during the artifact build

Upcoming:
  • Debug artifact builds on try. (Right now --artifact always results in an opt artifact build.)
  • Android artifact builds on try, thanks to nalexander.

Project #2 - S3 Cloud Compiler Cache
####################################

Some of the issues found last quarter for this project was around NSS which also was in need of replacing. This project was put on hold until the NSS work was completed. We’re going to resume this for Q4.

Project #3 - Metrics
####################

Accomplished recently:

Upcoming:
  • Figure out what to do with these small populations:
    • Ignore them - too small to be statistically significant
    • Aggregate them - All the rarely run suites can be pushed into a “Other” category
    • Show some other statistic:  Maybe median is better?
    • Show median of past day, and 90% for the week:  That can show the longer trend, and short term situation, for better overall feel.

Project #4 - Build automation improvements
##########################################
Accomplished recently:
  • Bug 1306167 - Updated build machines to use SSD. Linux PGO builds now take half the time

https://bugzilla.mozilla.org/show_bug.cgi?id=1306167#c11

Project #5 - Run Web platform tests from the source checkout
############################################################
Nothing to add on this edition.


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Wednesday, September 28, 2016

[NEW] Added build status updates - Usability improvements for Firefox automation initiative - Status update #6

[NEW] Starting on this newsletter we will start giving you build automation improvements since they help with the end to end time of your pushes

On this update we will look at the progress made in the last two weeks.

A reminder that this quarter’s main focus is on:
  • Debugging tests on interactive workers (only Linux on TaskCluster)
  • Improve end to end times on Try (Thunder Try project)

For all bugs and priorities you can check out the project management page for it:

Status update:
Debugging tests on interactive workers
---------------------------------------------------

Accomplished recently:
  • Fixed regression that broke the interactive wizard
  • Support for Android reftests landed

Upcoming:
  • Support for Android xpcshell
  • Video demonstration


Thunder Try - Improve end to end times on try
---------------------------------------------

Project #1 - Artifact builds on automation
##########################################

Accomplished recently:
  • Windows and Mac artifact builds are soon to land
  • |mach try| now supports --artifact option
  • Compiled-code tests jobs error-out early when run with --artifact on try

Upcoming:
  • Windows and Mac artifact builds available on Try
  • Fix triggering of test jobs on Buildbot with artifact build

Project #2 - S3 Cloud Compiler Cache
####################################

Nothing new in this edition.

Project #3 - Metrics
####################

Accomplished recently:

  • Drill-down charts:

  • Which lead to a detailed view:

  • With optional wait times included (missing 10% outliers, so looks almost the same):


Upcoming:
  • Iron out interactivity bugs
  • Show outliers
  • Post these (static) pages to my people page
  • Fix ActiveData production to handle these queries (I am currently using a development version of ActiveData, but that version has some nasty anomalies)

Project #4 - Build automation improvements
##########################################
Upcoming:


Project #5 - Run Web platform tests from the source checkout
############################################################
Accomplished recently:
  • WPT is now running from the source checkout in automation

Upcoming:
  • There are still parts in automation relying on a test zip. Next steps is to minimize those so you can get a loner, pull any revision from any repo, and test WPT changes in an environment that is exactly what the automation tests run in.

Other
#####
  • Bug 1300812 - Make Mozharness downloads and unpacks actions handle better intermittent S3/EC2 issues
    • This adds retry logic to reduce intermittent oranges


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Tuesday, September 13, 2016

Increasing test coverage

Last quarter I spent some time increasing mozci's test coverage. Here are some notes I took to help me remember in the future how to do it.

Here's some of what I did:
  • Read Python's page about increasing test coverage
    • I wanted to learn what core Python recommends
    • Tthey recommend is using coverage.py
  • Quick start with coverage.py
    • "coverage run --source=mozci -m py.test test" to gather data
    • "coverage html" to generate an html report
    • "/path/to/firefox firefox htmlcov/index.html" to see the report
  • NOTE: We have coverage reports from automation in coveralls.io
  • If you find code that needs to be ignored, read this.
    • Use "# pragma: no cover" in specific lines
    • You can also create rules of exclusion
  • Once you get closer to 100% you might want to consider to increase branch coverage instead of line coverage
    • Read more in here.
  • Once you pick a module to increase coverage
    • Keep making changes until you run "coverage run" and "coverage html".
    • Reload the html page to see the new results
After some work on this, I realized that my preferred place to improve tests is focusing on the simplest unit tests. I say this since integration tests do require proper work and thinking how to properly test them rather than *just* increasing coverage for the sake of it.


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Tuesday, August 02, 2016

Usability improvements for Firefox automation initiative - Status update #2

On this update, we will look at the progress made since our initial update.

A reminder that this quarter’s main focus is on:
  • Debugging tests on interactive workers (only Linux on TaskCluster)
  • Improve end to end times on Try (Thunder Try project)

For all bugs and priorities you can check out the project management page for it:

Debugging tests on interactive workers

Accomplished recently:
  • Bug 1285582 - Fixed Xvfb startup issue
  • Bug 1288827 - Improved mochitest UX (no longer need --appname, paths normalized)
  • Bug 1289879 - Uses mozharness venv if available

Upcoming:
  • Support for smaller test harnesses (Cpp, Mn, wpt, etc)
  • Improved one-click-loaner UX


Thunder Try - Improve end to end times on try

Project #1 - Artifact builds on automation

No news for this edition and probably the next one.


Project #2 - S3 Cloud Compiler Cache

Accomplished recently:
  • Working on testing sccache re-write on Try
  • More news on following update


Project #3 - Metrics

Accomplished recently:
  • Bug 1242017 - Metrics team will configure ingestion point into Telemetry

Upcoming:
  • Bug 1258861 - Working on underlying data model at the moment:


Other
  • Bug 1287604 - Experiment with different AWS instance types for TC linux64 builds
    • Some initial experiments have shown we can shave 20 minutes off an average linux64 build by using more powerful AWS instances, with a reasonable cost tradeoff. We’ll start the work of migrating to these new instances soon.
  • Bug 1272083 - Downloading and unzipping should be performed as data is received
  • Bug 1286336 - Improve interaction of automation with version control
    • Buildbot AMIs now seeded with mozilla-unified repo (Bug 1232442)
    • TaskCluster decision and various lint/test tasks now use `hg robustcheckout` and share caches more optimally (Bug 1247168)
      • Flake8 tasks now complete in as little as 9s (~3m before)
      • Decision tasks now complete in <60s average="" font="" on="">
    • Some TaskCluster tasks now share VCS checkouts on Try (Bug 1289643)
      • Tasks will complete faster on Try due to not having to perform full VCS checkout

[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1247168


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Thursday, July 21, 2016

Mozci and pulse actions contributions opportunities

We've recently finished a season of feature development adding TaskCluster support to add new jobs to Treeherder on pulse_actions.

I'm now looking at what optimizations or features are left to complete. If you would like to contribute feel free to let me know.

Here's some highligthed work (based on pulse_action issues and bugs):
This will help us save money in Heroku since using Buildapi + buildjson files is memory hungry and requires us to use bigger Heroku nodes.
This is important to help us change the behaviour of the Heroku app without having to commit any code. I've used this in the past to modify the logging level when debugging an issue.

This is also useful if we want to have different pipelines in Heroku. 
Having Heroku pipelines help us to test different versions of the software.
This is useful if we want to have a version running from 'master' against the staging version of Treeherder.
It would also help contributors to have a version of their pull requests running live.
We don't have any tests running. We need to determine how to run a minimum set of tests to have some confident in the product.

This needs integration tests of Pulse messages.
The comment is the bug is rather accurate and it shows that there are many small things that need fixing.
Manual backfilling uses Buildapi to schedule jobs. If we switched to scheduling via TaskCluster/Buildbot-bridge we would get better results since we can guarantee proper scheduling of a build + associated dependent jobs. Buildapi does not give us this guarantee. This is mainly useful when backfilling PGO test and talos jobs.

If instead you're interested on contributing to mozci you can have a look at the issues.


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Tuesday, July 19, 2016

Usability improvements for Firefox automation initiative - Status update #1

The developer survey conducted by Engineering Productivity last fall indicated that debugging test failures that are reported by automation is a significant frustration for many developers. In fact, it was the biggest deficit identified by the survey. As a result,
the Engineering Productivity Team (aka A-Team) is working on improving the user experience for debugging test failures in our continuous integration and speeding up the turnaround for Try server jobs.

This quarter’s main focus is on:
  • Debugging tests on interactive workers (only Linux on TaskCluster)
  • Improve end to end times on Try (Thunder Try project)

For all bugs and priorities you can check out the project management page for it:

In this email you will find the progress we’ve made recently. In future updates you will see a delta from this email.

PS = These status updates will be fortnightly


Debugging tests on interactive workers
Accomplished recently:
  • Landed support for running reftest and xpcshell via tests.zip
  • Many UX improvements to the interactive loaner workflow

Upcoming:
  • Make sure Xvfb is running so you can actually run the tests!
  • Mochitest support + all other harnesses


Thunder Try - Improve end to end times on try

Project #1 - Artifact builds on automation

Accomplished recently:
  • Landed prerequisites for Windows and OS X artifact builds on try.
  • Identified which tests should be skipped with artifact builds

Upcoming:
  • Provide a try syntax flag to trigger only artifact builds instead of full builds; starting with opt Linux 64.


Project #2 - S3 Cloud Compiler Cache

Accomplished recently:
  • Sccache’s Rust re-write has reached feature parity with Python’s sccache
  • Now testing sccache2 on Try

Upcoming:
  • We want to roll out a two-tier sccache for Try, which will enable it to benefit from cache objects from integration branches


Project #3 - Metrics

Accomplished recently:

Upcoming:
  • Putting Mozharness steps’ data inside Treeherder’s database for aggregate analysis

Other
Upcoming:
  • TaskCluster Linux builds are currently built using a mix of m3/r3/c3 2xlarge AWS instances, depending on pricing and availability. We’re going to be looking to assess the effects on build speeds of using more powerful AWS instances types, as one potential way of reducing e2e Try times.


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Thursday, June 30, 2016

Adding new jobs to Treeherder is now transparent

As of today, when you add new jobs to a push on Treeherder (among other actions), a job will get scheduled named 'Sch' for scheduling.

This gives two main advantages:

  • You can have access to the output of how the request went
  • A a link to file a bug under the right component and CC me

You can see in this screenshot what the job looks like:
In this push you can see three 'Sch' jobs. Each one is for a different action taken.
  1. Backfill a job - Link to log
  2. Add new jobs - Link to log
  3. Trigger all talos jobs - Link to log
You can find the link to filing a bug after you do this:

  1. Click on job (the job's panel will load at the bottom of the page)
  2. Click on "Job details"
  3. You will see "File bug" (see the screenshot below)
Thanks for reading and feel free to file bugs to make the output more understandable!



Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.