Thursday, October 21, 2010

OPSI nightmares on Windows 2008 R2 64-bit

Back in September I worked for five days trying to make OPSI work for our Windows 2008 64-bit reference machine and failed miserably.

I reached a point where I wanted to uninstall OPSI and install it again from the right mount share. The problem is that uninstall process got the machine wedged and impossible to login through RDP. We finally figured out that we had to remove two registry keys to allow the machine to reach the login page (We used IPMI and safe mode to do this).

I will just list some comments that might help somebody searching for answers related to this. I won't put too much effort on making it presentable or coherent.

NOTE: many many thanks for cshields for having helped me through those days!

The versions (I think) we are using are:

The version for staging-opsi is 3.4.0.0 and I don't see preloginloader installed on the win64-ix-ref.build.mozilla.org machine. Nevertheless when I click on the package I get 3.4 v27mozilla2.
From bug 596735:
Blue screen before reaching login dialog
What I could see through IPMI: "Opsi Login Blocker: service request failed"
I asked on the OPSI forums and I got this answer:

Re: Cannot uninstall on Windows 2008 x64

Beitragvon wolfbardo » 17 Sep 2010, 08:19
Hello Armen,
armenzg hat geschrieben:The machine is MS2008 R2 x64 with OPSI versions 3.4.0.0 and preloginloader package 3.4 v27mozilla2.
In this version there some issues in the 64-Bit handling of Files and Registry

You should update your opsi-version to 3.4.0.14 and the opsi-packages
http://download.uib.de/opsi3.4/produkte ... 8.4-1.opsi
http://download.uib.de/opsi3.4/produkte ... .4-69.opsi

Also have a look at opsi 4.0 Release Candidate (viewtopic.php?f=10&t=1747)

armenzg hat geschrieben:
What could I try?
Delete the follwoing registry-Keys
Code: Alles auswählen
[Registry_vista_del_loginblocker] deletekey [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Authentication\Credential Providers\{d2028e19-82fe-44c6-ad64-51497c97a02a}] deletekey [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Authentication\Credential Provider Filters\{d2028e19-82fe-44c6-ad64-51497c97a02a    }]
regards
bardo wolf
The key was to login with IPMI on safe mode and remove those two registry keys.
After that the machine was good almost good (RDP was not working - jabba fixed this for me in bug 597573).

I also got another machine wedged in the exact same way but not from trying to uninstall OPSI but from forgetting to resume the opsiclientd service.

In bug 596735 comment 12 I mentioned:

This morning I found win64-ix-ref in the same condition and it is *not* from uninstalling OPSI but from having forgotten to resume last night opsiclientd service (which is mentioned here https://forum.uib.de/viewtopic.php?f=8&t=939#p4694).
After we fixed the prelogin issue bhearsum tried to mount from the right OPSI mount but he was not succesful either:
I tried a few things without success:
- preloginloader 3.4-27mozilla2 and 3.4-30 -- These versions successfully
talked to the config server but failed to mount the SMB share. Errors were like
this:
[5] [Oct 20 09:43:11] [action_processor_starter.exe] (2250,
'WNetCancelConnection2', 'The network connection could not be found.')
(Windows.pyo|237)
[4] [Oct 20 09:43:11] [action_processor_starter.exe] Mounting
'\\production-opsi\opt_pcbin' to 'P:' (Windows.pyo|239)
[2] [Oct 20 09:43:12] [action_processor_starter.exe] Cannot mount: (1312,
'WNetAddConnection2', 'A specified logon session does not exist. It may already
have been terminated.') (Windows.pyo|253)
[1] [Oct 20 09:43:12] [action_processor_starter.exe] Traceback:
(Logger.pyo|647)
[1] [Oct 20 09:43:12] [action_processor_starter.exe] line 71 in
'' in file 'action_processor_starter.py' (Logger.pyo|647)
[1] [Oct 20 09:43:12] [action_processor_starter.exe] line 254 in 'mount'
in file 'OPSI\System\Windows.pyo' (Logger.pyo|647)
[1] [Oct 20 09:43:12] [action_processor_starter.exe] ==>>> Cannot mount:
(1312, 'WNetAddConnection2', 'A specified logon session does not exist. It may
already have been terminated.') (action_processor_starter.py|83)
[6] [Oct 20 09:43:12] [action_processor_starter.exe] Executing jsonrpc method
'setStatusMessage' (JSONRPC.pyo|225)
[6] [Oct 20 09:43:12] [action_processor_starter.exe] Options: {'params':
"Failed to process action requests: Cannot mount: (1312, 'WNetAddConnection2',
'A specified logon session does not exist. It may already have been
terminated.')"} (JSONRPC.pyo|229)
[7] [Oct 20 09:43:12] [action_processor_starter.exe] Appending param: Failed
to process action requests: Cannot mount: (1312, 'WNetAddConnection2', 'A
specified logon session does not exist. It may already have been terminated.'),
type: (JSONRPC.pyo|238)
[7] [Oct 20 09:43:12] [action_processor_starter.exe] jsonrpc string:
{"params":["Failed to process action requests: Cannot mount: (1312,
'WNetAddConnection2', 'A specified logon session does not exist. It may already
have been terminated.')"],"id":1,"method":"setStatusMessage"}
(JSONRPC.pyo|248)
[7] [Oct 20 09:43:12] [action_processor_starter.exe] requesting:
'https://localhost:4441/opsiclientd', query '{"params":["Failed to process
action requests: Cannot mount: (1312, 'WNetAddConnection2', 'A specified logon
session does not exist. It may already have been
terminated.')"],"id":1,"method":"setStatusMessage"}' (JSONRPC.pyo|250)
[6] [Oct 20 09:43:12] [action_processor_starter.exe] Using method POST
(JSONRPC.pyo|283)

I also tried preloginloader 3.4-69, which failed to event connect the config server -- claimed that it timed out.
So yeah... you can understand if I ever say I don't like OPSI and suggest you never to get close to it.


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Friday, October 15, 2010

Releng contributors from Seneca: Mozharness and buildapi

When I wrote my previous post trying to get students to contribute to Release Engineering I did not expect to have up to 8 students getting involved with us. 4 of them will be involved with the Fedora-Mozilla project.

This quarter we are going to have students contributing to the following projects:
It is going to be very interesting to work with so many students as I have to be careful not be dragged down but at the same time I believe it is very valuable to both the students and our team.

You can ignore the rest of the post if you are not one of the students. Unless you are very curious to see what their options/projects are going to be.

I am going to extend a little bit on what the last 3 points involve. This should get adrianp, mustafaj, asingh, jing yan started. Please congratulate them when you see them on an IRC channel for taking up on the challenge.

To understand how our systems work checkout this new hire brownbag.

Mozharness
Mozharness is a new idea which is still being baked by aki. The goal is to pull much of the build logic that is currently intertwined on the buildbot factories (more than 40 different factories and we have passed 8 thousand lines in process/factory.py). The benefits of doing this is that it would become a standalone project which would be very easy to contribute to and run locally without having to setup buildbot.
Some problems would be that much of the API still on the works but at the same time it has a lot of value for students to face real life development and not school isolated development.

adrianp and mustafaj will be planning their development in here:
* http://zenit.senecac.on.ca/wiki/index.php/Mozharness
and here is the repo to work against:
* http://hg.mozilla.org/users/asasaki_mozilla.com/mozharness

Here are some thoughts on where this project could get some love:
  • logging options
  • documentation
  • add new scripts
    • do a locale repackage (easy)
      • setup the right environment
      • download en-US nightly
      • repackage with the right locale and strings
    • run packaged unit tests
    • generating MAR files (updates)
    • signing
  • give love to mozharness/scripts/partner-repacks.py
  • porting hgtools.py to mozharness as a supporting library
  • some POCs (proof-of-concepts)
Tip on writing new scripts:
  • Use the two mozharness existing scripts to understand what format to follow
  • To find out what steps are used for a certain type of job you have two places to check:
    • Look for logs on tinderbox. There is one for L10n as well. Each step starts with
      "======== BuildStep started ========"
      • TinderboxPrint steps are metadata to show on the tinderbox boxes to be scrapped by other tools (which slave the job run on, changeset used, etc)
    • Look at process/factory.py and ask a releng which script you are trying to write and to which factory it corresponds. For instance, L10n repacks corresponds to BaseRepackFactory and NightlyRepackFactory. NOTE that there is inheritance involved.
Thoughts on logging:
The most standalone item I could find in mozharness is fleshing out the logging options/todo items:
Namely:
 - network logging
 - ability to change log settings mid stream?  Meaning, can we have a logger that's set to INFO level; can we set it to DEBUG halfway through the script?  If not, that's ok, but I'd like to know.
 - per-module/class log settings: e.g. can we have anything that goes through a Mercurial object to be DEBUG with a timestamp, and everything else be INFO with no timestamp?
 - turn off global logging settings:  when I create two log objects, I think I end up getting duplicate log lines on screen and in logfiles, even if I'm only calling one of them.
 - generic log rotation that is configurable


Hopefully the patches would end up being generic and reusable, but optional.  For instance, if they created the SimpleNetworkLogger and MultiNetworkLogger objects that could be drop-in replacements for SimpleFileLogger or MultiFileLogger, that would be awesome.
Problems faced by aki:
I need to figure out how to keep the "generic harness" (anyone at any company using python scripts could find them useful, like log.py, config.py, errors.py, script.py) separate from the Mozilla-specific code (l10n.py or repack.py, aus2.py, talos.py, etc.) 
Porting hgtools.py
mozharness and hgtool.py will need to communicate somehow.
I was going to just call hgtool.py from commandline in AbstractMercurialScript, but Catlee wants to merge it in.
So I'd lean towards creating a generic mozharness/lib/source/mercurial.py that has generic methods.

(Generic meaning we would hopefully later be able to create a mozharness/lib/source/git.py, or a mozharness/lib/source/svn.py, and be able to use them interchangeably.  SOURCEMODULE.checkout() for example, where SOURCEMODULE is any of the above.)

We can take a lot of that from work already done in Buildbot's Mercurial/Git/etc. steps, but remove any Buildbot dependencies.  And of course Catlee has most of it written already in hgtool.py.
Thoughts on POC:
  • Here are some thought of aki but I think POCs will make more sense a little later when we understand where we want to take mozharness to. aki's thoughts are good but involve setting buildbot up which I want students to avoid at first. We'll leave it for now.
I'd love to see that.  Maybe take a small project and try it in mozharness?
How about something like porting buildbot-configs/test-masters.sh + setup_master.py to mozharness?


Like
|checkconfigMasters.py --list-masters| or |checkconfigMasters.py -8 --only-builder-masters| or something.

That should be a relatively small project, but potentially useful.  We should end up with something way more configurable, and verify that you know your way around the harness.

Open to other ideas for a first project too... whatever POC script should hopefully be doable in a day or two.
BuildAPI
ssalbiz (our current kick-ass intern) has taken some time to prepare instructions for our new students on how to setup BuildAPI locally.
I still scratch myself and try to understand why it is called BuildAPI because this part of the project is more of a web-based project. It is based on pylons (python web framework) and it also uses the Google Chart Tools API (I think).

asingh and jing yang would probably coordinate this in here:
http://zenit.senecac.on.ca/wiki/index.php/BuildAPI

What I need students to do is one of the two:
  • generate graphs, charts, CSVs and CPU totals for infrastructure load blog posts like this
    • this is very useful and could move us forward towards having this information being published publicly for consumption
    • I highly encourage this one as understanding the mental model behind it is easier
  • write a tool that analyzes our statusDB and figure out slaves that have been continually been burning jobs (sometimes it takes us several days to spot them)
They can get started by pulling the repo and these snapshots from August:
Simple releng bugs
Out of the list of bugs that we have tagged as [simple] on the Whiteboard I have pulled the following bugs:
  • bug 563941 - Fake signing for staging releases
  • bug 437482 - create Mercurial bundles
  • bug 510770 - make source-package
  • bug 586664 - Normalize builder names
  • bug 563939 - Staging release should download previous release to staging-stage
  • bug 577696 - Automate sending an email to metrics team once a release is pushed live
  • bug 590329 - fail early if the clobberer is broken (at least for releases) 
This work could be coordinate in here:
http://zenit.senecac.on.ca/wiki/index.php/Release_Simple_Bugs#Project_Description

NOTE: All of these bugs would require you to setup buildbot and use Dummy factories to skip certain parts of our automation. Three or four of these bugs could count as a one student project. Less than that there is no value.

[EDIT] I have fixed a couple of URLs


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Wednesday, October 13, 2010

How we use Narro to localize Firefox

Narro is a community-maintained web-based project (mainly alexxed AFAIK) which is hosted at https://l10n.mozilla.org/narro.

This web interface allows localizers to translate Firefox without having to deal with Mercurial, compare-locales or tools to generate language packages. It is easy to use, speeds up the translation process and improves collaboration between new contributors and veterans.

The way I have used it so far is:
  • Our Armenian localizer lead visits the "Translate" tab and starts translating
  • Since he is the main localizer he is also an approver and by default his changes are auto-approved
  • He should also be visiting once in a while the "Review" tab to approve and discard suggestions from people on the community
  • Once in a while I will visit the "Export" tab to generate a zip file (Image 1)
    • I still have not figured out when to have these checkboxes checked before exporting:
      • " - I believe I don't need this one as it gives me .xml files
      • " - 
  • I unzip the tar ball containing the Armenian repo exported from Narro on my Mercurial local checkout overwritting everything
  • I run the following commands:
    •  find . -type f -exec chmod -x {} \;
    • modules=( browser dom extensions netwerk other-licenses security toolkit); for module in ${modules[@]}; do cd $module; for file in `find . -name "*dtd" -type f`; do cat /Users/armenzg/moz/armenian/patches/headers/header.dtd $file > $file.new; mv $file.new $file; done; for file in `find . -name "*properties" -type f`; do cat /Users/armenzg/moz/armenian/patches/headers/header.properties $file > $file.new; mv $file.new $file; done; cd ..; done
    • I don't know yet on which point the permissions get changed to +x from the narro server to the zip file. According to alexxed, the Armenian project does not have +x on the files on disk.
    • I also have to re-insert the license headers as it seems that Narro uses the very old headers which I have not yet learned how to overwrite
  • After  analysing the output of "hg diff" I decide to commit
  • My commit will add new translations that have been approved by the Armenian localization lead and new en-US strings that Narro has imported into the Armenian project
  • My commit changes will be shown in tomorrow's nightly
    • You can see that I name my commits with "Narro import for $date"
  • You can see the status of the Armenian locale and the output of compare-locales after having pushed my changes
  • The new compare-locale 0.9 can even catch parsing problems (Image 2):
    • Fixing those with Narro can be real painful
    • A trick is to uncheck "Show files" and "Show folders" when looking at the "Files" tab
I hope this feeds the curiosity of some people that have asked me how do I use Narro to drive the localization of Firefox.

I will explain more in other blog posts on how I use compare-locales, how I used bitbucket before we became an official localization team and how I sign-off my revision for a beta. So much to blog about!

Image 1 - This is the output of exporting a Narro project. It gives you a XPI, a diff and an exported project

Image 2 - Compare-locales 0.9 now also shows parsing errors and warnings besides missing strings



Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Tuesday, October 12, 2010

Feedback wanted - Mozilla's Release Engineering presentation at FSOSS: draft 1

On October 29th I will be presenting at FSOSS in Toronto talking about Release Engineering at Mozilla through buildbot.

This is the first time that I am putting together a presentation of this scale and I have found preparing the right content for a presentation a difficult task. That is why I want to do it in the open to get feedback early and help me prepare something that is educative and worth the time of others.

The people attending the session will be of various sectors and not necessarily familiar to the functions of a release engineering team.

Link to draft.



Presentation's goals
My goals are the following:
  • Explain in a progressive manner how our infrastructure help developers know if their changes are good
  • Explain how our automation has grown and understand how much we have been able to do thanks to buildbot
  • Explain the release process which is made of different teams joining forces to create a product that is delived to final users
Currently I have started to develop the first two goals as I would rather have two parts very well developed rather than three incomplete parts.

What do you think? Shall I give more emphasis to the third point? What do you think would be more interesting for people to learn from Release Engineering?

Structure of the presentation
You can look at the whole slide or look at this brief bullet point explaining the structure of the presentation.
  • Brief introduction
  • Cycle of a push
  • Builds and results
  • Scaling
  • Release process
  • Buildbot
  • Q&A
  • Wrap-up
NOTES:
I am tackling this project as an iterative process and allowing room for cutting scope without damaging the overall quality. I will provide full sources once done and provide all the images and diagrams for re-use.


Creative Commons License
This work by Zambrano Gasparnian, Armen is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.