Recently started understanding machine learning and often need to reinstall the environment, but sometimes machine learning stuff can take up several GB of space. Although my internet is 100Mbps, it still seems slow in front of dozens of GB.
So I specially researched how to set the cache directory for APT and PIP, so that I don’t have to download them again every time.
APT Cache
Refer to here
Modify any file in the /etc/apt/apt.conf.d/ directory by adding the following section. The directory part can be changed as needed.
1 2 |
Dir{Cache /cache/apt} Dir::Cache /cache/apt; |
PIP Cache
Refer to here, add the folloing line to end of /etc/profile
1 |
export PIP_CACHE_DIR=/cache/pip |
Anaconda
Anaconda can specify where to download packages to be shared by using the CONDA_PKGS_DIRS parameter. Add the following instruction to the end of /etc/profile.
1 |
export CONDA_PKGS_DIRS=/cache/conda_pkgs |
Permission issue
When pip and conda share the same cache, permission conflicts may occur among different users. This can be resolved by modifying the group permissions of the cache.
1 2 3 4 |
groupadd cudausers #add a new group for cache chgrp -R cudausers /cache # chgrp for cache chmod -R g+ws /cache #chmod for /cache usermod -a -G cudausers ubuntu #add new user to the group |
Other
There may be other downloads (if any for GPT-j’s pre-training data) in the $HOME/.cache directory. If necessary, they can also be linked to other places.