

So many awesome AI features, and I just switched to helix /s
So many awesome AI features, and I just switched to helix /s
For now I feel disabling archives and my simple list of bots to drop in Nginx seems to work very well, it doesn’t create the archives anymore and the load went down also on the server.
Yep, off to prison you go! /s
The most funny part of the post is the last line :D
Hm, but this only works on tmpfs which is in memory. It seems that with XFS I could have done it too: https://fabianlee.org/2020/01/13/linux-using-xfs-project-quotas-to-limit-capacity-within-a-subdirectory/ but I used ext4 out of habit.
For now I asked chatgtp to help me to implement a simple return 403 on bot user agent. I looked into my logs and collected the bot names which I saw. I know it won’t hold forever but for now it’s quite nice, I just added this file to /etc/nginx/conf.d/block_bots.conf and it gets run before all the vhosts and rejects all bots. The rest just goes normally to the vhosts. This way I don’t need to implement it in each vhost seperatelly.
➜ jeena@Abraham conf.d cat block_bots.conf
# /etc/nginx/conf.d/block_bots.conf
# 1️⃣ Map user agents to $bad_bot
map $http_user_agent $bad_bot {
default 0;
~*SemrushBot 1;
~*AhrefsBot 1;
~*PetalBot 1;
~*YisouSpider 1;
~*Amazonbot 1;
~*VelenPublicWebCrawler 1;
~*DataForSeoBot 1;
~*Expanse,\ a\ Palo\ Alto\ Networks\ company 1;
~*BacklinksExtendedBot 1;
~*ClaudeBot 1;
~*OAI-SearchBot 1;
~*GPTBot 1;
~*meta-externalagent 1;
}
# 2️⃣ Global default server to block bad bots
server {
listen 80 default_server;
listen [::]:80 default_server;
listen 443 ssl default_server;
listen [::]:443 ssl default_server;
# dummy SSL cert for HTTPS
ssl_certificate /etc/ssl/certs/ssl-cert-snakeoil.pem;
ssl_certificate_key /etc/ssl/private/ssl-cert-snakeoil.key;
# block bad bots
if ($bad_bot) {
return 403;
}
# close connection for anything else hitting default server
return 444;
}
I already have LVM but I was using it to combine drives. But it’s not a bad idea, if I can’t do it with Docker, at least that would be a different solution.
Ok, there was one issue already and I added my comment to it: https://codeberg.org/forgejo/forgejo/issues/7011#issuecomment-7022288
Sadly that’s not the solution to my problem. The whole point op open-sourcing for me is to make it accessible to as many people as possible.
Hm, I’m afraid none of them really seems to cover the repo-archives case, therefor I’m afraid the size:all doesn’t include the repo-archives either.
But I’m running it in a container, perhaps I can limit the size the container gets assigned.
I have monitoring of it, but it happened during night when I was sleeping.
Actually I saw a lot of forgejo action on the server yesterday but didn’t think it would go so fast.
There is no setting like that, at least I can’t find it.
Codeberg is a instance of forgejo, I run my own instance because I don’t want to be dependent on others.
I need to look into it, thanks!
Yeah, I really need to figure out how to do quotas per service.
But then how do people who search for code like yours find your open source code if not though a search engine which uses a indexing not?
It makes a zip file and a tarball, and keeps them for cached for other people to download in the future.
I also thought about it, but the custom domain feature only works on the $5 / month plan.
I use Radicale för it.