Issue2892

classification
Title: Migrate from hg.python.org to GitHub
Type: rfe Severity: normal
Components: Any Versions:
Milestone: Jython 2.7.3
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: jeff.allen Nosy List: darjus, jeff.allen, stefan.richthofer, zyasoft
Priority: high Keywords:

Created on 2020-05-25.10:37:35 by jeff.allen, last changed 2022-06-02.19:39:23 by jeff.allen.

Messages
msg13062 (view) Author: Jeff Allen (jeff.allen) Date: 2020-05-25.10:37:35
As in PEP-512 (https://www.python.org/dev/peps/pep-0512/) and for much the same reasons, we've said we'd migrate to GitHub. This seems the apposite time, before we get started on bug-fixes and Jython 3. This implicitly includes using git rather than mercurial in the development environment.

I'm tentatively assigning myself the ticket. I have a bit of experience now with git, so can probably manage this without making too much of a mess. The main difficulty is to avoid something that works but has some fearful drawback later on. I'll dry run it in my own space first, fix a bug in the dry, then repeat for real. If anyone knows a lot about this, help would be welcome.

Fortunately, CPython have done this before us. (And lots of others, but they're nearest, and in public.) Discussion on the core-workflow list begins around the start of 2016.
https://mail.python.org/archives/list/core-workflow@python.org/2016/1/
This is something like the final plan:
https://mail.python.org/archives/list/core-workflow@python.org/thread/NZDI44KRG7NUYI2PDHPDW6ZXQTAJZ7P7/


We shouldn't expect to get the same tool integration as CPython (e.g. updating bugs.jython.org with changes on GitHub). We'd have easier use of tools supporting the development process if we were in the Python organisation, from GitHub's POV. If we can work as well as now, haven't dropped any critical information, and can merge PRs via GitHub, I would count that a success.

There will be several details to take care of, some general to Python, some specific to Jython. I doubt I will spot them all first time. Here's a start ...


In the generic category:

* Converting the repo (history, contributor names etc.): https://mail.python.org/archives/list/core-workflow@python.org/thread/KXSKDHUOSGX6WSSH54I7HIH33SDLYBIW/

* Contributor names: 

* sys._git: https://bugs.python.org/issue27593

* git and new lines: https://bugs.python.org/issue27425


Specific to Jython:

* Continue building with ant, with minimal change, but using git in place of (i.e. no longer supporting) hg to get version control information. (git must be available on the path or you get an uncontrolled snapshot of some kind.)

* Have concurrent branches like CPython and the dev-guide. (We haven't really done that, but need to, in a small way.)
msg13070 (view) Author: Jeff Allen (jeff.allen) Date: 2020-05-28.07:02:18
I have tried this several times in my own account under the names jython-redstart, -flycatcher, -nightjar. (They all migrate to the UK, geddit?)

This works pretty well, except when it comes to connecting with GitHub user names, where GitHub asks for help identifying 80 (yes, read 'em and weep) contributors. 

This is a lie. GitHub has identified all but about a dozen cases. However, the UI is confusing, doesn't distinguish these cases, and experimenting means importing it all again. After an evening of research I can only correlate about 6 proper "lost users" confidently with GitHub usernames (and 3 of those are Darjus!).

Do we know for sure that this is "our" Finn Bock? https://github.com/bckfnn
msg13074 (view) Author: Jeff Allen (jeff.allen) Date: 2020-05-30.16:55:02
I still find the GitHub user name wizard confusing, but I've got it to do something sensible (I think). Basically, it lets you substitute one email address for another in change sets.

It appears to list all the addresses it has found, whether it has identified the account or not, and so it is a lot of work to identify the ones that properly need a response. (I've done this.)  To identify the account correctly, the e-mail address has to be (in some sense) public in that account. Where it is not, there are three alternatives:

1. Choose an account, in which case the wizard will (most times) substitute a made-up @users.noreply.github.com address.
2. Supply an e-mail address you know it can correlate with the right account.
3. Leave it as it is, attributed to just the e-mail account.

Where the anomaly is the result of an obvious slip (e.g. jeff@localhost) alternative 2 seems sensible.

Where we can't reliably identify the committer with an account, I propose alternative 3. If they subsequently relate their account publicly with the address (verifiably to GitHub), I expect that to fix the problem.

The difficult case is where someone has used several addresses over time and not all are associated publicly with their account. All possibilities are open, but fortunately, where it matters for us, alternative 2 results in the same domain name, so we're not rewriting history too badly.

The result is: https://github.com/jeff5/jython-nightjar

Next I'll see if I can work with it, and if all seems ok, I'll repeat all this at https://github.com/jython/jython  .
msg13082 (view) Author: Jeff Allen (jeff.allen) Date: 2020-05-31.09:52:10
This change includes moving from Mercurial to Git as the SCM tool, and therefore sys._mercurial becomes sys._git, and our Ant script must generate the values.

The corresponding change in CPython (between 2.7.13 and 2.7.14 is here:
https://github.com/python/cpython/commit/2c7085fd7b00cba8b5ab258c62453b6a12418b73

Note that this change is abrupt: sys._mercurial simply disappears from the name space. I propose the same approach.
msg13088 (view) Author: Jeff Allen (jeff.allen) Date: 2020-05-31.17:22:18
CPython (as observed in Travis CI) signs on like this:

CPython 2.7.15+ (heads/2.7:89b5ea2, Dec 19 2018, 15:16:35)
CPython 3.10.0a0 (heads/master:007bb06, May 31 2020, 03:31:05)

It's how the git version info appears that I want to emulate, as an indication I've mined Git correctly for the information. "master" vs "2.7" is a function of the branch we're on. Jython (clean from a commit) now signs on with:

Jython 2.7.3a1-DEV (heads/master:ba16fbc, May 31 2020, 18:10:49)
...
>>> sys._git
('Jython', 'heads/master', 'ba16fbc')

Seems like I can generate an acceptable sys.version from Ant and Gradle, and it all ties in as you'd hope when pushed to the dry-run repo:
https://github.com/jeff5/jython-nightjar/commit/ba16fbc48da14521906e983aa49312630f0b3e55
msg13090 (view) Author: Jeff Allen (jeff.allen) Date: 2020-06-01.13:19:49
The next high-risk change is transferring issues we have collected on jythontools/jython to jython/jython (when it exists). One cannot transfer issues between repositories in different organisations (here from jythontools/ to jython/) but one may transfer a whole repository.

So I think the answer involves transferring ownership of jythontools/jython to jython, so we can then transfer the (open) issues. But we don't want it to arrive called /jython so change the name *first* to jython-mirror.

I tried transferring issues between repos on my own account and renaming the destination, then transferring it to the jython organisation, and GitHub just sucks all this change up, so that a link to the original issue lands in the new place. Try it with https://github.com/jeff5/jython-nightjar/issues/1 and you end up at https://github.com/jython/jython-mirror-nightjar/issues/4, despite the transfer issue, rename repo, transfer repo dance it has been through.

So I think the sequence will be:
1. Create jython/jython by import from hg.python.org as described already.
2. Rename jythontools/jython to jythontools/frozen-mirror.
3. Transfer jythontools/frozen-mirror to jython/frozen-mirror.
4. Transfer open and closed-fixed issues from jython/frozen-mirror to jython/jython. 

There is a twist in step 4: issues get new numbers when moved, so I propose to make a note on jython/frozen-mirror fixed issues of their current number, before the transfer, to tie up with earlier records. I could update NEWS with the new number, but I can't fix the (now incorrect) numbers embedded in change sets. We'll just have to put up with numbers below 190 maybe being duplicates. (There are only 6 such.)
msg13094 (view) Author: Jeff Allen (jeff.allen) Date: 2020-06-03.09:31:12
The last thing to test is that the migrated repository still allows us to make a release.

I don't want actually to release anything (it would be a 2.7.3a1 almost identical to 2.7.2). The last part, packaging and staging at Sonatype, won't have changed. The thing to test is branding with the version control information and the tripwires that prevent us releasing something that only works because of files not checked in.

Oh, and to have the instructions to do it again.
https://github.com/jython/devguide/pull/10

I've been through all this now, but as a dry-run in my own space. The use of information from Git took a lot of trial and error. I'm copying CPython
https://github.com/python/cpython/blob/3.8/configure.ac#L42
but those particular commands, and the way the results get used,
https://github.com/python/cpython/blob/3.8/Modules/getbuildinfo.c#L35
may not be the best choice. The names owe a lot to the legacy of svn and hg and do not well reflect very well what they contain.

However, git has a lot of options. When it comes to tripwires in the release process, slightly different git queries are used.
https://github.com/python/release-tools/blob/master/release.py#L442

Next is to do it for real, after which hg.python.org/jython should never be modified again.
msg13096 (view) Author: Jeff Allen (jeff.allen) Date: 2020-06-03.09:55:05
The discrepancies in what the sign-on banner and sys._git produce has been bugging me. I fell foul of it too in the tripwires in the build that detect whether we're at the tag matching the release, by checking build.git.tag against jython.release.

The sign-on includes sys._git[1]. Here are some examples of sys._git from CPython:
('CPython', 'v2.7.16', '413a49145e')
('CPython', 'v3.7.3', 'ef4ec6ed12')
('CPython', 'tags/v3.8.0', 'fa919fd')
and from my dry-run
('Jython', 'tags/v2.7.3a1', '625fdf3b1')

Why the slightly ugly "tags/" prefix in some cases and not others? The code in configure.ac has not changed between these version. In all cases, we are seeing the output of 'git describe --all --always --dirty'.

It turns out that the culprit is actually git, which differs by version due to a regression fixed here:
https://github.com/git/git/commit/1bba00130a1a0332ec0ad2f878a09ca9b2b18ee2
msg13100 (view) Author: Jeff Allen (jeff.allen) Date: 2020-06-05.10:51:52
I think I've done what this set out to achieved, so tentatively claiming fixed.

@darjus, I've been including you on this because I think the mirroring you set up from hg.python.org/jython can stop now. I may have broken it anyway, by moving the mirror repo, but there should be no more pushes anyway to trigger it.
msg13110 (view) Author: Jeff Allen (jeff.allen) Date: 2020-07-18.18:44:45
Turns out the GitHub import tool didn't work well at all: it mucks up the history by dropping parent pointers. I even found some that seem to have more to do with the date than the actual parentage.

This is quite annoying as we have settled in, accepted PRs and so on.

I'm currently betting on https://github.com/frej/fast-export to give us a correct result (in my private repo). That is quite difficult to drive with the manual patch-up, but it seems to be working.

Assuming it does, we must also figure out how we can replace jython/jython with the outcome.
msg13112 (view) Author: Stefan Richthofer (stefan.richthofer) Date: 2020-07-19.22:43:11
Does the import tool have a bugtracker? Maybe the issue should be reported there. After a quick search, however I only found on
https://docs.github.com/en/github/importing-your-projects-to-github/about-github-importer
the contact button.
msg13114 (view) Author: Jeff Allen (jeff.allen) Date: 2020-07-20.06:48:26
My second attempt at this (or the 20th, depending how you count) is at https://github.com/jeff5/jython-whinchat and does not have the history problem Jim detected. I would be grateful to as many people as can check my work.

Now I seem to have produced a correct result, subject to confirmation, I'm going to ask GitHub support how we might best replace jython/jython with repo content produced by an identical process. This will also make them aware of the failings of their tool.

There are some gains from having to do this again: I'm much better acquainted with the history, I was also able to fix other niggles (like a change of my own I mis-attributed), and I was able to minimise the number of tags lost with change sets at unnamed heads. It's some compensation for GitHub eating my week-end.
History
Date User Action Args
2022-06-02 19:39:23jeff.allensetstatus: open -> closed
2020-07-20 06:48:26jeff.allensetmessages: + msg13114
versions: - Jython 2.7.3
2020-07-19 22:43:11stefan.richthofersetnosy: + stefan.richthofer
messages: + msg13112
2020-07-18 18:44:45jeff.allensetstatus: pending -> open
resolution: fixed -> accepted
messages: + msg13110
2020-06-05 10:51:52jeff.allensetstatus: open -> pending
resolution: accepted -> fixed
messages: + msg13100
2020-06-03 09:55:05jeff.allensetmessages: + msg13096
2020-06-03 09:31:13jeff.allensetmessages: + msg13094
2020-06-01 13:19:49jeff.allensetnosy: + zyasoft, darjus
messages: + msg13090
2020-05-31 17:22:18jeff.allensetmessages: + msg13088
2020-05-31 09:52:11jeff.allensetnosy: - bckfnn
messages: + msg13082
2020-05-30 16:55:02jeff.allensetmessages: + msg13074
2020-05-28 07:02:18jeff.allensetnosy: + bckfnn
resolution: accepted
messages: + msg13070
2020-05-25 10:37:35jeff.allencreate