How mirror/clone google Debian repository
How mirror/clone google Debian repository
Why do we clone/mirror repositories?
It not common for a serious production system to have regular internet access. System admins often mirror the repositories for offline usage. The most common method for cloning a repository is rsync. It is the perfect tool for the job. But it has to be run against a rsync server, not every mirror server support this but almost all of the central repositories support this.
What is special about google apt repository?
Since rsync is not supported by the google repository another method is needed. There are many tools for cloning the deb repositories with HTTP instead of rsync. Copying of an HTTP repository is done by indexing the files in the folders. When viewed with browser, https://packages.cloud.google.com/apt/ looks indexable.
In really it consist of a set of carefully linked static pages which are not indexable. This is clearly visible at https://packages.cloud.google.com/apt/pool. Folder looks empty, even though it is not.
This makes existing tools unusable as they depend on repository to be indexable. The nonstandard design of the google repository also causes other problems. For example, debmirror fails because of a missing directory. Even the AptCacher which works in a proxy manner, failed to cache the packages from the google repository.
Indexing the non-indexable repository
The packages file in the repository contains information about all the files and it is the index used by the package manager. This file is used by apt to generate download links. For example to information about amd64 packages are the Packages file Example entry about the kubelet package in Packages file for xenial:
Package: kubelet
Version: 1.5.1-00
Installed-Size: 105097
Maintainer: Kubernetes Authors <kubernetes-dev+release@googlegroups.com>
Architecture: amd64
Depends: iptables (>= 1.4.21), kubernetes-cni (>= 0.3.0.1), iproute2, socat, util-linux, mount, ebtables, ethtool, init-system-helpers (>= 1.18~)
Description: Kubernetes Node Agent
The node agent of Kubernetes, the container cluster manager
Homepage: https://kubernetes.io
Filename: pool/kubelet_1.5.1-00_amd64_bb82dd4bcf0c9bc813c599f62afa48832bf34302d723c5a38347c2754f3735e2.deb
Priority: optional
SHA256: bb82dd4bcf0c9bc813c599f62afa48832bf34302d723c5a38347c2754f3735e2
Section: misc
Size: 15118582
We are still pretty curious about why google included SHA256sums in file names, something not seen in any other open source projects. It might also be an artifact of their build pipelines. This naming makes harder to clone the repository. There is already a special field in the file for SHA256 sums. Such naming conventions are often used to prevent indexing with brute forcing.
When we add a file path to google URL we got a working link.
https://packages.cloud.google.com/apt/
+
pool/kubelet_1.5.1-00_amd64_bb82dd4bcf0c9bc813c599f62afa48832bf34302d723c5a38347c2754f3735e2.deb
=
https://packages.cloud.google.com/apt/pool/kubelet_1.5.1-00_amd64_bb82dd4bcf0c9bc813c599f62afa48832bf34302d723c5a38347c2754f3735e2.deb
Steps of cloning the repository
We start with pulling whatever wget can pull:
wget -r https://packages.cloud.google.com/apt/dists/kubernetes-xenial
This will pull some of the folders and the Packages
file. After changing directory into the desired architecture like amd64, following one liner can be used to create an index that google was not willing to share with us.
cat Packages | grep Filename | cut -d: -f2 | cut -c2-999 | sed "s/^/packages.cloud.google.com\/apt\//" > /tmp/links
After creating the link list, we need to create a pool folder and change directory into it. We can leverage GNU parallel to pull everything efficiently.
mkdir pool
cd pool
cat /tmp/links | parallel --gnu "wget {}"
-c
parameter can be added to wget for continue mode
Now we have all the data. We can test the repo with adding following file to apt directory.
cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb file:/path/of/the/downloaded/repo/apt kubernetes-xenial main
EOF
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
sudo apt -oDebug::pkgAcquire::Worker=1 install kubelet