building our repository (builder)
Re: building our repository
kraileth any news for the server?
Re: building our repository
OOo seems to be a real monster! But it's good to know what dimensions package building can take.ASX wrote:I'm recording the following info for own use:
while building editors/openoffice-4 I have measured a total tmpfs usage of approx 14 GB (11.9 + 2.0);
editors/openoffice-devel instead failed exceeding the max tmpfs size. (currently 12 GB).
BTW: Since I switched from portmaster to Synth on my PC at work, I noticed that the osinfo-db-tools port would not build in a clean environment. I took a look at it, found a fix and mailed the maintainer. Despite him obviously not using Synth, he committed a fix within hours! Right now another update broke fossil and I'm going to notify the maintainer, too. It's good to know that people are open and friendly towards Synth even though most people seem to be using Poudriere. And it will probably be helpful to those other people who want to use Synth if we're using it large-scale. So GhostBSD might be doing a community service even for users of other FreeBSD flavors here.

I made an appointment with my boss for Monday. We'll be able to discuss things then.ericbsd wrote:kraileth any news for the server?
Re: building our repository
Thanks.kraileth wrote:Well, this is what I'm doing really, collecting as much info as possible about package building requirements.ASX wrote:But it's good to know what dimensions package building can take.
It is clear that I cannot push the current server beyond certain limits (I already did, and crashed it), so I took the opportunity to gain as much knowledge as possible.
I made an appointment with my boss for Monday. We'll be able to discuss things then.
While I would really prefer a dedicated machine for building, I have also cosnidered to re-use the current server, that would be possible but will require a complete reinstall (basically using a different disk layout: more swap, some UFS partition, some "striped" ZFS dataset, ...).
I will make a recap sometimes later today or tomorrow.

Re: building our repository
Another note to self:
While building everything, some packages are skipped, for various reasons, among them some aren't built because they require the src in /usr/src (example: virtualbox-ose-kmod.
I didn't realized that because src are installed on my system, but they need to be installed on the server too.
That means packages are build against a specific source tree patch-level.
Therefore will be be required to always have the latest patch level update properly and promptly installed, possibly before to start building the packages.
It also imply that any time the system is updated, the packages should be rebuilt.
(I guess synth can and do autodetect the required rebuilds, but I've not verified that, will do at the next system update).
~~~
Currently the server is rebuilding the whole amd64 repo, using a 3 builders x 3 jobs.
It has reached approximately one third of the total, (7700 pkgs/22 hours) and the current rate is 350 pkgs per hour, which means approx 116 pkgs per hour for each builder. (peak speed 900 pkgs(hour).
Right now it is not possible to use more than 3 builders, but I would expect to run at least 6 or 8 builder on the definitive builder machine, and that should allow for approxmately 600~700 pkgs per hour globally, thus requiring one day and half to build a complete repo.
Upon updates, say one week later I would expect a relatively low number of packages would need to be rebuilt, (say 5000 for quartetly repository), and thus approximatly 10 hours for each weekly update. You get the idea.
Beside getting the new server, I would be inclined to reconfigure the current Canadian server now, that would allow to:
a) test some more performant configuration
b) allow for a builder backup machine, just in case ...
Considering that actually we are using a mirrored setup, all we need to do would be:
undo the mirror, setup the system on the freed disk, transfer the data on that disk, reboot and finally free the second disk, and then complete the setup restoring the mirror where needed.
Eric, let me know what you think about, eventually we will prepare a detailed plan so that there will not be any downtime except the instant reboot.

While building everything, some packages are skipped, for various reasons, among them some aren't built because they require the src in /usr/src (example: virtualbox-ose-kmod.
I didn't realized that because src are installed on my system, but they need to be installed on the server too.
That means packages are build against a specific source tree patch-level.
Therefore will be be required to always have the latest patch level update properly and promptly installed, possibly before to start building the packages.
It also imply that any time the system is updated, the packages should be rebuilt.
(I guess synth can and do autodetect the required rebuilds, but I've not verified that, will do at the next system update).
~~~
Currently the server is rebuilding the whole amd64 repo, using a 3 builders x 3 jobs.
It has reached approximately one third of the total, (7700 pkgs/22 hours) and the current rate is 350 pkgs per hour, which means approx 116 pkgs per hour for each builder. (peak speed 900 pkgs(hour).
Right now it is not possible to use more than 3 builders, but I would expect to run at least 6 or 8 builder on the definitive builder machine, and that should allow for approxmately 600~700 pkgs per hour globally, thus requiring one day and half to build a complete repo.
Upon updates, say one week later I would expect a relatively low number of packages would need to be rebuilt, (say 5000 for quartetly repository), and thus approximatly 10 hours for each weekly update. You get the idea.

Beside getting the new server, I would be inclined to reconfigure the current Canadian server now, that would allow to:
a) test some more performant configuration
b) allow for a builder backup machine, just in case ...
Considering that actually we are using a mirrored setup, all we need to do would be:
undo the mirror, setup the system on the freed disk, transfer the data on that disk, reboot and finally free the second disk, and then complete the setup restoring the mirror where needed.
Eric, let me know what you think about, eventually we will prepare a detailed plan so that there will not be any downtime except the instant reboot.
Re: building our repository
The last complete build just built 24000 pkgs, at a rate of 330 pkgs/hour, that it run out of ram, using 3 builders, in a total of 70 hours.
I was able to restart to the server, since then I have reconfigured synth to use only builders, and it is now completing the last 1500 pkgs. (and unfortunately it need to repeat the build failures, 30 or so, time consuming).
Next interesting thing to see will be how much pkgs will be rebult of quarterly port tree update.
Next on, there should be a new qurterly branch soon, I want to see how much pkgs will be updated.
I was able to restart to the server, since then I have reconfigured synth to use only builders, and it is now completing the last 1500 pkgs. (and unfortunately it need to repeat the build failures, 30 or so, time consuming).
Next interesting thing to see will be how much pkgs will be rebult of quarterly port tree update.
Next on, there should be a new qurterly branch soon, I want to see how much pkgs will be updated.
Re: building our repository
Just wanted to give an update on the server question:
I talked to my boss. In general he has agreed to support this to a large degree; but the usual pricing for our servers is even quite a bit more expansive than the OVH offers that Eric linked to... We can get quite a discount, though, but it totally depends on what exactly we need as there's one condition: It must be some kind of server that we already have because for a small monthly fee it's impossible to order new machine. Common replacable parts like RAM or disks are not much of a problem.
So: ASX, what's your current take on what specs make sense for a build machine? I read that you even considered to stick with the one server that is currently available. That would be the fallback plan for, but I'd prefer even a medicore piece of hardware over having one server do everything. Package building is putting quite a bit of load on it and we will need to consider even more services in the future. Eric told me that he'd like to run a mailserver again - and we seriously need to consider something like Let's encrypt. The new Firefox didn't want to let me log in on the forum without big warnings... Backups and things like that will also have to be done.
What do you currently estimate the RAM requirements to be? And how many space do we need on SSD and on HDD? In case of CPU faster is better of course, but I tend to say that this is a "soft" requirement while RAM and disk space are hard requirements where we cannot make a lot of trade-offs.
I talked to my boss. In general he has agreed to support this to a large degree; but the usual pricing for our servers is even quite a bit more expansive than the OVH offers that Eric linked to... We can get quite a discount, though, but it totally depends on what exactly we need as there's one condition: It must be some kind of server that we already have because for a small monthly fee it's impossible to order new machine. Common replacable parts like RAM or disks are not much of a problem.
So: ASX, what's your current take on what specs make sense for a build machine? I read that you even considered to stick with the one server that is currently available. That would be the fallback plan for, but I'd prefer even a medicore piece of hardware over having one server do everything. Package building is putting quite a bit of load on it and we will need to consider even more services in the future. Eric told me that he'd like to run a mailserver again - and we seriously need to consider something like Let's encrypt. The new Firefox didn't want to let me log in on the forum without big warnings... Backups and things like that will also have to be done.
What do you currently estimate the RAM requirements to be? And how many space do we need on SSD and on HDD? In case of CPU faster is better of course, but I tend to say that this is a "soft" requirement while RAM and disk space are hard requirements where we cannot make a lot of trade-offs.
Re: building our repository
I try to provide info as much useful as I can, but please consider I'm doing experience of the field right now, therefore it is unlikely I have a definitive answer.
Understanding package building (using synth)
There are a couple of things that are (should be) mostly one time only:
a) fetching distfiles (the software sources), can be considered expensive only the first time and right now we have already collected 120 GB of distfiles (practically nearly all). After that they get updates, but that will be only a few at time. Distfiles are (mostly) common for both arch 32 and 64 bit.
b) ccache: ccache is built on first compilation, and is subsequently resused on second compilation (rebuild because of dependency change or rebuild becase of changed options).
I managed to collect up to 90 GB of ccache, but they mostly disappeared after last night crash (and that was not because of filesystem corruption, the reason is unknown to me at this time), the current ccache is only 30 GB.
ccache will differ between 32 and 64 bit, and threfore we should account for 128 + 128 GB for ccache, for performance reasons these ccache should reside on UFS + softupdates (no journaling).
Because of the above two requirements, the first build will always be the slower one. (much because of distfiles).
The second and subsequents build will take advantage of the already downloaded distfiles and from the already built ccache.
The last build (still running right now) is just completing and the approximate rate is 110 package/hour per each builder.
The question is: how many builders can be run in parallel ?
That's the most difficult thing to answer: right now, because we are running practicaly without swap (only 1 GB) only two builder may be supported, on the other side, while building small packages (and they are the vast maiority, (say 98%) we can safely run 6 builders, or more if we add swap.
The next question would be: how much linearly will increse the output upon increasing the number of builders ?
the answer depend on several factors and is also not easy to answer: when using multiple builders at the same time we can have some luck, and while some builder is running the compiler some other may be doing a different task possibly a single-core task (examples are: fetching, configure, build depends, package); some of this task is disk I/O intensive (build-depends) some not ... many builders may find themselves competing for disk I/O access.
Overall, considering that the majority of packages are small, I would be inclined to mazimize the number of builders. I have also seen that overloading the CPU (high load factor) doesn't impact on the system responsiveness.
Instead, overloading the disk I/O is going to be much more problematic, and I suspect it can be even more problematic if/when the system would be using swap.
What would help here is to distribute the I/O load across different disks ideally four but we can fallback to two:
1-ccache, 2-swap, 3-ports and distfiles, 4-repository.
Of course, SSD in place of rotational disk would also be of help (ccache and swap).
~~~
After said all that, we need to consider what we are going to do: how much frequently we will update our repo ?
The starting point I have in my mind is once per week (for both repos, 32 and 64), and while the first build will take 3 days or so for each repo, the weekly updates should be done in less than 24 hours.
We will base our repo on quarterly, therefore we are going to perform a full rebuild only 4 times per year.
~~~
RECAP
- as much RAM as possible, minimum 32 GB, 48 would be much better, 64 even more.
- disks: minimum two disks, optimal 4 disks, SSDs would be the bests (the total storage requirements is around 600 GB, say 1 TB because will will run there the webserver to distribute packages)
- CPU, minimum 4core/8threads ... more will be better
Now it depend on what you can have available, we can easily tradoff RAM for SSDs or viceversa,
also, we can use more cores and slower disks, or more RAM and slowers disks.
Just having a dedicated machine similar to the current Canadian server will be an improvement, therefore you can take that configuration as a starting point.
If you have any further question, please ask. - or simply join the IRC #ghostbsd-server
Also, If you think I'm missing something, please let me know about, Thanks.
Understanding package building (using synth)
There are a couple of things that are (should be) mostly one time only:
a) fetching distfiles (the software sources), can be considered expensive only the first time and right now we have already collected 120 GB of distfiles (practically nearly all). After that they get updates, but that will be only a few at time. Distfiles are (mostly) common for both arch 32 and 64 bit.
b) ccache: ccache is built on first compilation, and is subsequently resused on second compilation (rebuild because of dependency change or rebuild becase of changed options).
I managed to collect up to 90 GB of ccache, but they mostly disappeared after last night crash (and that was not because of filesystem corruption, the reason is unknown to me at this time), the current ccache is only 30 GB.
ccache will differ between 32 and 64 bit, and threfore we should account for 128 + 128 GB for ccache, for performance reasons these ccache should reside on UFS + softupdates (no journaling).
Because of the above two requirements, the first build will always be the slower one. (much because of distfiles).
The second and subsequents build will take advantage of the already downloaded distfiles and from the already built ccache.
The last build (still running right now) is just completing and the approximate rate is 110 package/hour per each builder.
The question is: how many builders can be run in parallel ?
That's the most difficult thing to answer: right now, because we are running practicaly without swap (only 1 GB) only two builder may be supported, on the other side, while building small packages (and they are the vast maiority, (say 98%) we can safely run 6 builders, or more if we add swap.
The next question would be: how much linearly will increse the output upon increasing the number of builders ?
the answer depend on several factors and is also not easy to answer: when using multiple builders at the same time we can have some luck, and while some builder is running the compiler some other may be doing a different task possibly a single-core task (examples are: fetching, configure, build depends, package); some of this task is disk I/O intensive (build-depends) some not ... many builders may find themselves competing for disk I/O access.
Overall, considering that the majority of packages are small, I would be inclined to mazimize the number of builders. I have also seen that overloading the CPU (high load factor) doesn't impact on the system responsiveness.
Instead, overloading the disk I/O is going to be much more problematic, and I suspect it can be even more problematic if/when the system would be using swap.
What would help here is to distribute the I/O load across different disks ideally four but we can fallback to two:
1-ccache, 2-swap, 3-ports and distfiles, 4-repository.
Of course, SSD in place of rotational disk would also be of help (ccache and swap).
~~~
After said all that, we need to consider what we are going to do: how much frequently we will update our repo ?
The starting point I have in my mind is once per week (for both repos, 32 and 64), and while the first build will take 3 days or so for each repo, the weekly updates should be done in less than 24 hours.
We will base our repo on quarterly, therefore we are going to perform a full rebuild only 4 times per year.
~~~
RECAP
- as much RAM as possible, minimum 32 GB, 48 would be much better, 64 even more.
- disks: minimum two disks, optimal 4 disks, SSDs would be the bests (the total storage requirements is around 600 GB, say 1 TB because will will run there the webserver to distribute packages)
- CPU, minimum 4core/8threads ... more will be better
Now it depend on what you can have available, we can easily tradoff RAM for SSDs or viceversa,
also, we can use more cores and slower disks, or more RAM and slowers disks.
Just having a dedicated machine similar to the current Canadian server will be an improvement, therefore you can take that configuration as a starting point.
If you have any further question, please ask. - or simply join the IRC #ghostbsd-server

Also, If you think I'm missing something, please let me know about, Thanks.
Re: building our repository
The amd64 build complete this evening, (remains to be built packages that depend on kernel src and on lib32, that I have not installed).
More or less a week is past from when the build was started, I have now updated the source tree (branches/2017Q1) and a synth status request now is telling that will rebuild approx. 5000 packages. More or less (+- 100%) this is the quantity of packages we expect to rebuild each week.
Considering a build speed of 500 pkgs per hour that is quite manageable, either weekly or even more often.
Remaion to be seen how much packages will be rebuilt when a new branch (2017Q2) will be created, my estimation is about 15.000 pkgs, we will see.
I'm goint to start the "weekly update", using 2 builders, 6 jobs per builder.
Like before, unfortunately it will try to rebuild the failed packages, some are time consuming, so expect a lower build rate.
More or less a week is past from when the build was started, I have now updated the source tree (branches/2017Q1) and a synth status request now is telling that will rebuild approx. 5000 packages. More or less (+- 100%) this is the quantity of packages we expect to rebuild each week.
Considering a build speed of 500 pkgs per hour that is quite manageable, either weekly or even more often.
Remaion to be seen how much packages will be rebuilt when a new branch (2017Q2) will be created, my estimation is about 15.000 pkgs, we will see.

I'm goint to start the "weekly update", using 2 builders, 6 jobs per builder.
Like before, unfortunately it will try to rebuild the failed packages, some are time consuming, so expect a lower build rate.
Re: building our repository
kraileth old machine make only sense I am not asking to have a new rig
I think ASX sums it up well, the only thing I would like to know is how much it will cost us, we can easily go 100 per month no problem.

I think ASX sums it up well, the only thing I would like to know is how much it will cost us, we can easily go 100 per month no problem.
Re: building our repository
Yes, the equivalent of 100 CAD (70 €) is more or less exactly the price that I was offered (for normal customers we charge about three times as much) we only have to settle on the hardware. My colleague will take a look around today and then my boss will decide what the final offer would be. I would imagine that the processor won't be top-notch but I think the priorities are: 32 GB RAM and SSD storage (probably mixed with 2 SSDs for building purposes) and 2 HDDs for storage.ericbsd wrote:kraileth old machine make only sense I am not asking to have a new rig![]()
I think ASX sums it up well, the only thing I would like to know is how much it will cost us, we can easily go 100 per month no problem.
That sounds very good! Weekly will be perfect, I guess. In that case the machine could perhaps also assume a second role: Hosting backups of the Canadian server. This shouldn't put too much extra load on the server, especially if we throttle traffic. IMO this is something that we should consider.ASX wrote:The amd64 build complete this evening, (remains to be built packages that depend on kernel src and on lib32, that I have not installed).
More or less a week is past from when the build was started, I have now updated the source tree (branches/2017Q1) and a synth status request now is telling that will rebuild approx. 5000 packages. More or less (+- 100%) this is the quantity of packages we expect to rebuild each week.
Considering a build speed of 500 pkgs per hour that is quite manageable, either weekly or even more often.
I'd like Synth to grow a feature like this: Recognize the "biggies" (webkit*, chromium, ...) from a list and use a different profile for them. It would be nice if Synth would be flexible in this regard and e.g. add a "reserved" builder state or something, using less (only one?) builders when it is going to build one of the XXL packages and use a higher job count (basically adding the number of jobs of the "reserved" builders to the active builder). This is because I've often all packages built except for Chromium - and that one took around 9 hours to build with just two jobs. At the same time 5 other builders have already reached the "shutdown" state and if Chromium is build with e.g. 12 jobs instead, it's done much, much more quickly. Any thoughts on this? Perhaps this could be a future feature request for Synth.Remaion to be seen how much packages will be rebuilt when a new branch (2017Q2) will be created, my estimation is about 15.000 pkgs, we will see.
I'm goint to start the "weekly update", using 2 builders, 6 jobs per builder.
Like before, unfortunately it will try to rebuild the failed packages, some are time consuming, so expect a lower build rate.