This commit is contained in:
Your Name
2025-10-26 11:34:23 +00:00
9 changed files with 237 additions and 79 deletions

1
.gitignore vendored
View File

@@ -161,3 +161,4 @@ cython_debug/
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
.aider*
CLAUDE.md

81
CLAUDE.md Normal file
View File

@@ -0,0 +1,81 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Development Commands
### Local Development
```bash
# Install dependencies
pip install -r app/requirements.txt
# Run the Flask app locally
cd app
gunicorn --workers 1 --threads 4 sw:app
# Access at http://127.0.0.1:8000
```
### Docker Development
```bash
# Build and run with Docker
docker build -t smallweb .
docker run -p 8080:8080 smallweb
```
### Maintenance
```bash
# Crawl all feeds (expensive operation)
cd maintenance
./crawl.sh
# Process crawl results and clean up feeds
./process.sh
```
## Project Architecture
Kagi Small Web is a feed aggregation platform that curates and displays content from the "small web" - personal blogs, independent YouTube channels, and webcomics. The system operates as a Flask web application with background feed processing.
### Core Components
**Main Application (`app/sw.py`)**
- Flask web server serving random posts from curated feeds
- Background feed updates every 5 minutes using APScheduler
- User interaction features: emoji reactions, notes, content flagging
- Iframe embedding for seamless content viewing
- Multiple content modes: blogs, YouTube videos, GitHub projects, comics
**Feed Management System**
- `smallweb.txt`: Personal blog RSS/Atom feeds (~thousands of entries)
- `smallyt.txt`: YouTube channel feeds with subscriber/frequency limits
- `smallcomic.txt`: Independent webcomic feeds
- `yt_rejected.txt`: Rejected YouTube channels for reference
**Data Persistence**
- `data/favorites.pkl`: User emoji reactions stored as OrderedDict per URL
- `data/notes.pkl`: User notes with timestamps per URL
- `data/flagged_content.pkl`: Content flagging counts
### Feed Processing Pipeline
1. **Ingestion**: Fetches from Kagi's Small Web API (`/api/v1/smallweb/feed/`)
2. **Filtering**: YouTube Shorts removal, image detection for comics
3. **Caching**: In-memory storage with periodic updates
4. **Generation**: Creates appreciated feed and OPML export
### User Features
- **Random Discovery**: Algorithmic selection from curated feeds
- **Content Types**: Blogs (`?mode=0`), YouTube (`?yt`), Appreciated (`?app`), GitHub (`?gh`), Comics (`?comic`)
- **Search**: Full-text search across titles, authors, descriptions
- **Reactions**: 14 emoji types with max 3 per URL, automatic feed inclusion
- **Personal Notes**: Timestamped annotations per URL
- **Content Moderation**: Community flagging system
## Deployment
The application deploys to Google Cloud Run with:
- GCS bucket mounting via gcsfuse for persistent data
- Cloud Build pipeline (`cloudbuild.yaml`)
- Service account with appropriate IAM permissions
- Auto-scaling with 2-4 instances

BIN
app/static/HumanWeb.png Normal file
View File

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

View File

@@ -8,6 +8,7 @@ body {
}
#header {
position: fixed;
top: 0;
@@ -27,6 +28,7 @@ body {
#controls {
display: flex;
align-items: center;
gap: 5px;
}
#search-form {
@@ -42,10 +44,10 @@ body {
#search-input {
font-family: pixel, Arial, sans-serif;
font-size: 18px;
padding: 5px 25px 5px 5px;
padding: 5px 25px 5px 10px;
border: none;
border-radius: 5px;
width: 200px;
width: 280px;
height: 38px;
box-sizing: border-box;
}
@@ -144,11 +146,12 @@ body {
overflow: hidden;
white-space: nowrap;
text-overflow: ellipsis;
max-width:21vw;
max-width:20vw;
background-color: rgba(255, 255, 255, 0.1);
border-radius: 5px;
padding: 5px 10px;
min-width:300px;
min-width:180px;
margin: 0 10px;
}
/* --- always show the compact “phone” url label --- */
@@ -181,6 +184,10 @@ body {
cursor: pointer;
}
#controls .flag-link {
margin: 0 8px;
}
#header .favorite-link:hover, #header .flag-link:hover {
background-color: transparent;
}
@@ -197,31 +204,38 @@ body {
.right {
flex: 1;
text-align: right;
max-width: 410px;
max-width: 480px;
display: flex;
align-items: center;
justify-content: flex-end;
}
#search-form {
margin-right: 10px;
}
#content {
padding-top: 60px;
position: fixed;
top: 60px;
left: 0;
width: 100%;
height: calc(100vh - 60px);
box-sizing: border-box;
}
#content iframe {
border: none;
width:100%;
height:100%;
width: 100%;
height: 100%;
}
#content-yt {
position: fixed;
top: 60px;
left: 0;
display: flex;
justify-content: center;
align-items: center;
height: calc(100vh - 58px);
width: 100%;
height: calc(100vh - 60px);
box-sizing: border-box;
}
#content-yt iframe {
@@ -231,7 +245,7 @@ body {
.popup-container {
display: inline-block;
position: relative;
padding: 0 10px;
padding: 0 8px;
}
.popup {

View File

@@ -6,13 +6,14 @@ from flask import (
redirect,
render_template,
Response,
jsonify,
)
from html import escape
import feedparser
import feedparser
from apscheduler.schedulers.background import BackgroundScheduler
import random
from datetime import datetime
from datetime import datetime, timedelta
from urllib.parse import urlparse, parse_qs
from urllib.parse import urlencode
import atexit
@@ -21,6 +22,8 @@ import time
from urllib.parse import urlparse
from feedwerk.atom import AtomFeed
from collections import OrderedDict
import uuid
import json
appreciated_feed = None # Initialize the variable to store the appreciated Atom feed
opml_cache = None # will hold generated OPML xml
@@ -94,6 +97,8 @@ def time_ago(timestamp):
return f"{int(seconds // 86400)} days"
random.seed(time.time())
@@ -119,7 +124,9 @@ def update_all():
new_entries = update_entries(url + "?nso") # no same origin sites feed
if not bool(urls_cache) or bool(new_entries):
urls_cache = new_entries
# Filter out YouTube URLs from main feed
urls_cache = [entry for entry in new_entries
if "youtube.com" not in entry[0] and "youtu.be" not in entry[0]]
new_entries = update_entries(url + "?yt") # youtube sites
@@ -133,29 +140,30 @@ def update_all():
urls_gh_cache = new_entries
new_entries = update_entries(url + "?comic") # comic sites
if not bool(urls_comic_cache) or bool(new_entries):
# Filter entries that have images in content
urls_comic_cache = [
entry for entry in new_entries
entry for entry in new_entries
if entry[3] and ('<img' in entry[3] or '.png' in entry[3] or '.jpg' in entry[3] or '.jpeg' in entry[3])
]
# Prune favorites_dict to only include URLs present in urls_cache or urls_yt_cache
current_urls = set(entry[0] for entry in urls_cache + urls_yt_cache)
favorites_dict = {url: count for url, count in favorites_dict.items() if url in current_urls}
# Build urls_app_cache from appreciated entries in urls_cache and urls_yt_cache
urls_app_cache = [e for e in (urls_cache + urls_yt_cache)
if e[0] in favorites_dict]
# Generate the appreciated feed
generate_appreciated_feed()
# ---- NEW: update cached OPML ----
global opml_cache
opml_cache = generate_opml_feed()
except:
print("something went wrong during update_all")
finally:
@@ -193,6 +201,7 @@ def update_entries(url):
cache = [
(entry["link"], entry["title"], entry["author"], entry["description"], entry["updated"])
for entry in formatted_entries
if entry["link"].startswith("https://") # Only allow https:// URLs for iframe embedding
]
print(len(cache), "entries")
return cache
@@ -329,8 +338,10 @@ def index():
if "youtube.com" in short_url:
parsed_url = urlparse(url)
videoid = parse_qs(parsed_url.query)["v"][0]
current_mode = 1
query_params = parse_qs(parsed_url.query)
if "v" in query_params:
videoid = query_params["v"][0]
current_mode = 1
# get favorites
reactions_dict = favorites_dict.get(url, OrderedDict())
@@ -349,6 +360,7 @@ def index():
# get flagged content
flag_content_count = flagged_content_dict.get(url, 0)
if url.startswith("http://"):
url = url.replace(
@@ -425,14 +437,13 @@ def favorite():
# Regenerate the appreciated feed
generate_appreciated_feed()
# Save to disk
if (datetime.now() - time_saved_favorites).total_seconds() > 60:
time_saved_favorites = datetime.now()
try:
with open(PATH_FAVORITES, "wb") as file:
pickle.dump(favorites_dict, file)
except:
print("can not write fav file")
# Save to disk immediately (multi-instance deployment requires immediate persistence)
time_saved_favorites = datetime.now()
try:
with open(PATH_FAVORITES, "wb") as file:
pickle.dump(favorites_dict, file)
except:
print("can not write fav file")
# Preserve all query parameters except 'url'
query_params = request.args.copy()
@@ -530,6 +541,9 @@ def opml():
opml_cache = generate_opml_feed()
return Response(opml_cache, mimetype="text/x-opml+xml")
time_saved_favorites = datetime.now()
time_saved_notes = datetime.now()
time_saved_flagged_content = datetime.now()
@@ -579,6 +593,7 @@ except:
print("No flagged content data found.")
# get feeds
update_all()
@@ -590,4 +605,30 @@ scheduler.start()
scheduler.add_job(update_all, "interval", minutes=5)
def save_all_data():
"""Save all data before shutdown"""
print("[DEBUG] Saving all data before shutdown...")
try:
with open(PATH_FAVORITES, "wb") as file:
pickle.dump(favorites_dict, file)
print(f"[DEBUG] Saved {len(favorites_dict)} favorites")
except Exception as e:
print(f"Error saving favorites: {e}")
try:
with open(PATH_NOTES, "wb") as file:
pickle.dump(notes_dict, file)
print(f"[DEBUG] Saved {len(notes_dict)} notes")
except Exception as e:
print(f"Error saving notes: {e}")
try:
with open(PATH_FLAGGED, "wb") as file:
pickle.dump(flagged_content_dict, file)
print(f"[DEBUG] Saved {len(flagged_content_dict)} flagged items")
except Exception as e:
print(f"Error saving flagged content: {e}")
atexit.register(save_all_data)
atexit.register(lambda: scheduler.shutdown())

View File

@@ -6,6 +6,15 @@
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Kagi Small Web</title>
<meta property="og:title" content="Kagi Small Web">
<meta property="og:description" content="Discover the small web - personal blogs, independent YouTube channels, and webcomics from genuine humans on the internet.">
<meta property="og:image" content="{{ url_for('static', filename='HumanWeb.png', _external=True) }}">
<meta property="og:type" content="website">
<meta property="og:url" content="{{ request.url }}">
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="Kagi Small Web">
<meta name="twitter:description" content="Discover the small web - personal blogs, independent YouTube channels, and webcomics from genuine humans on the internet.">
<meta name="twitter:image" content="{{ url_for('static', filename='HumanWeb.png', _external=True) }}">
<link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">
<link rel="prefetch" href="{{ next_link }}">
<link rel="prerender" href="{{ next_link }}">
@@ -68,9 +77,10 @@
</div>
</div>
<div id="url-display">
{% if author and author != '' and author !=" " %}
<span id="author" title="author">{{author}}</span> @
<span id="author" title="author">{{author}}</span> @
{% endif %}
<a href="{{url}}">{{short_url}}</a>
@@ -80,33 +90,18 @@
<a class="phone" href="{{url}}">{{domain}}</a>
</div>
</div>
<div class="middle">
<input type="radio" id="close-popup" name="popup" class="popup-radio" checked>
<!-- Note Popup
<div class="popup-container">
<label for="note-popup" class="popup-link">{{ 'Notes' if notes_count else 'Note' }}{{ ' (' ~ notes_count ~ ')' if notes_count else '' }}</label>
<input type="radio" id="note-popup" name="popup" class="popup-radio">
<div class="popup">
{% for note, timestamp in notes_list %}
<div class="note {% if loop.index % 2 == 0 %}alternate{% endif %}">
<span class="note-time">({{ timestamp|time_ago }})</span> {{ note }}
</div>
{% endfor %}
<form action="{{ prefix }}note" method="post">
<p style="text-align:left">Got thoughts on this post? Share them in a public note.</p>
<textarea name="note_content" class="note-textarea"></textarea>
<input type="hidden" name="url" value="{{ url }}" />
<button type="submit" class="button ok-button">Submit</button>
<label for="close-popup" class="button cancel-button">Cancel</label>
</form>
<form action="{{ prefix }}" method="get" id="search-form">
<div class="search-container">
<input type="text" name="search" placeholder="Search..." value="{{ search_query }}" id="search-input">
{% if search_query %}
<a href="{{ prefix }}" class="clear-search" title="Clear search">&times;</a>
{% endif %}
</div>
</div>
-->
<!-- Add a visually hidden submit button for accessibility and explicit submission -->
<button type="submit" style="display: none;" aria-hidden="true">Search</button>
</form>
<!-- Share Popup -->
<div class="popup-container">
<label for="spread-love" class="popup-link" title="Share this post">Share</label>
@@ -132,7 +127,8 @@
<label for="close-popup" class="sbutton cancel-button">Cancel</label>
</div>
</div>
<!-- Flag Popup -->
<!-- Flag Popup -->
<form action="{{ prefix }}flag_content?{{ request.query_string.decode()|safe }}" method="post">
<input type="hidden" name="url" value="{{ url }}">
<button title="Flag this post"
@@ -142,6 +138,32 @@
Flag{% if flag_content_count and flag_content_count > 0 %} <span class="flag-danger">({{ flag_content_count }})</span>{% endif %}</button>
</form>
</div>
<div class="middle">
<input type="radio" id="close-popup" name="popup" class="popup-radio" checked>
<!-- Note Popup
<div class="popup-container">
<label for="note-popup" class="popup-link">{{ 'Notes' if notes_count else 'Note' }}{{ ' (' ~ notes_count ~ ')' if notes_count else '' }}</label>
<input type="radio" id="note-popup" name="popup" class="popup-radio">
<div class="popup">
{% for note, timestamp in notes_list %}
<div class="note {% if loop.index % 2 == 0 %}alternate{% endif %}">
<span class="note-time">({{ timestamp|time_ago }})</span> {{ note }}
</div>
{% endfor %}
<form action="{{ prefix }}note" method="post">
<p style="text-align:left">Got thoughts on this post? Share them in a public note.</p>
<textarea name="note_content" class="note-textarea"></textarea>
<input type="hidden" name="url" value="{{ url }}" />
<button type="submit" class="button ok-button">Submit</button>
<label for="close-popup" class="button cancel-button">Cancel</label>
</form>
</div>
</div>
-->
{% if current_mode==1 %}
<a id="switch" href="{{ prefix }}">Web</a>
<a id="switch" href="{{ prefix }}?app">Appreciated</a>
@@ -167,16 +189,6 @@
<path d="M21 12.79A9 9 0 0111.21 3 7 7 0 1012.79 21 9 9 0 0121 12.79z"/>
</svg>
</button>
<form action="{{ prefix }}" method="get" id="search-form">
<div class="search-container">
<input type="text" name="search" placeholder="Search..." value="{{ search_query }}" id="search-input" style="width: 100px;">
{% if search_query %}
<a href="{{ prefix }}" class="clear-search" title="Clear search">&times;</a>
{% endif %}
</div>
<!-- Add a visually hidden submit button for accessibility and explicit submission -->
<button type="submit" style="display: none;" aria-hidden="true">Search</button>
</form>
<a href="https://kagi.com" title="Visit Kagi"><img src="{{ url_for('static', filename='UseKagiV4C.gif') }}" alt="Use Kagi" class="kagi-gif"></a>
<a href="https://github.com/kagisearch/smallweb" title="Visit GitHub repository">Contribute</a>
<div class="popup-container">
@@ -214,15 +226,11 @@ href="https://github.com/kagisearch/smallweb#small-web">our sources</a>
or check if your blog is in the <a class="container-link" href="https://github.com/kagisearch/smallweb/blob/main/smallweb.txt">list</a>. You'll also
encounter these pages now in <a class="container-link" href="https://kagi.com">Kagi search</a> when you're looking for something relevant.</p>
<p>Hit 'Next Post' to read something new. We only show posts from the last
seven days to keep it fresh. Feel like saying thanks or jotting down a thought? Use 'Appreciation' and 'Notes'. They'll be around for about a week, but hey, it's a way to say "hi" to someone else out here.
</p>
<p>Find a cool site or spot something sketchy? Use 'Report/Add Site' to help
curate the feed.
seven days or so to keep it fresh.
</p>
<p>
And yep, this whole thing is <a class="container-link"
href="https://github.com/kagisearch/smallweb">open-source</a>. Oh, and no JavaScript on our
end.</p>
href="https://github.com/kagisearch/smallweb">open-source</a>.</p>
<p>---</p>
<p>So, what do you say? Ready to meet some neighbors?
<br/><br/></p>
@@ -234,6 +242,7 @@ end.</p>
<a href="https://kagi.com"><img id="logo" src="{{ url_for('static', filename='doggo_px.png') }}" alt="Kagi Doggo
mascot"/></a>
</div>
{% if no_results %}
<div id="content" class="no-results" style="display: flex; flex-direction: column; justify-content: center; align-items: center; height: calc(100vh - 60px);">
<h2 style="font-family: pixel, Arial, sans-serif; color: #2c3e50; margin-bottom: 20px;">No results found for "{{ search_query }}"</h2>
@@ -273,6 +282,7 @@ end.</p>
referrerpolicy="no-referrer"
style="display:none;width:0;height:0;border:0;visibility:hidden;"></iframe>
{% endif %}
<script>
(() => { // IIFE keeps global scope clean
const CSS_ID = "global-dark-mode-style";
@@ -360,5 +370,8 @@ end.</p>
);
})();
</script>
</body>
</html>

View File

@@ -13,6 +13,7 @@ steps:
'--region', 'us-central1',
'--allow-unauthenticated',
'--set-env-vars', 'URL_PREFIX=/smallweb',
'--update-secrets', 'MOD_SECRET_KEY=smallweb-mod-secret-key:latest',
'--service-account', 'smallweb@${PROJECT_ID}.iam.gserviceaccount.com',
'--cpu=1',
'--memory=1Gi',

View File

@@ -9,6 +9,7 @@ http://trueniverse.com/trueniverserss.xml
http://www.incidentalcomics.com/feeds/posts/default
https://accurseddragon.com/comic/rss?id=1
https://aliendice.com/feed/
https://analognowhere.com/feed/rss.xml
https://arrhythmiacomic.com/feed.xml
https://ashinthewindcomic.com/rss
https://asterandthefire.com/feed/
@@ -21,19 +22,25 @@ https://cloverandcutlass.com/feed/
https://cyantian.net/feed/
https://damsels-dont-wear-glasses.com/rss.php
https://elephant.town/comic/rss
https://existentialcomics.com/rss.xml
https://explosm.net/rss.xml
https://feeds.feedburner.com/buttersafe
https://fluffygangcomic.com/feed.xml
https://foxes-in-love.tumblr.com/rss
https://ghostoflight.spiderforest.com/comic/rss?id=1
https://giftscomic.com/rss.php
https://halflightcomics.com/feed
https://heartofthestorm.co.uk/comic/rss
https://heirsoftheveil.com/feed/
https://hollymacycomic.com/feed.txt
https://huzzah.spiderforest.com/comic/rss?id=1
https://jackbeloved.com/feed/
https://joshreads.com/feed/
https://jpawlik.com/blog/category/comic/feed/
https://jumpherocomic.com/rss/
https://keyspace.spiderforest.com/comic/rss?id=1
https://keytothefuturesfate.com/feed/
https://killsixbilliondemons.com/atom/
https://kingsofsorts.com/feed/
https://latchkeykingdom.com/comics/feed/
https://laurenipsum.spiderforest.com/comic/rss?id=1
@@ -58,18 +65,22 @@ https://roar.spiderforest.com/comic/rss?id=1
https://saffronwave.spiderforest.com/comic/rss
https://sarahcandersen.com/rss
https://sarilho.net/en/feed
https://slumbertowncomic.com/atom/
https://slumbertowncomic.com/feed/
https://sunbirdcomic.com/feed/
https://thecityundersaturn.com/feed/
https://theonlyhalfsaga.com/ald/rss
https://thesecretknots.com/feed/
https://thunderstarcomic.art/comic/index.xml
https://tuppenceforstardust.spiderforest.com/comic/rss?id=1
https://uv.itsnero.com/feed/
https://vanguardcomic.com/feed/
https://warandpeas.com/feed/
https://witchofdezina.com/comic/rss
https://wizardzines.com/index.xml
https://workchronicles.com/feed/
https://www.conniewonnie.com/feeds/posts/default
https://www.davidrevoy.com/feed/rss/categorie2/webcomics/
https://www.davidrevoy.com/static4/rss-options
https://www.dumbingofage.com/feed/
https://www.entropycomic.com/feed/
@@ -95,8 +106,3 @@ https://www.tamurancomic.com/comic/rss?id=1
https://www.thebrightsidecomic.com/feed/
https://xkcd.com/rss.xml
https://yokokasquest.com/feed
https://thesecretknots.com/feed/
https://wizardzines.com/index.xml
https://hollymacycomic.com/feed.txt
https://jpawlik.com/blog/category/comic/feed/
https://existentialcomics.com/rss.xml

View File

@@ -476,3 +476,4 @@ https://www.youtube.com/feeds/videos.xml?channel_id=UC0cCwNAtOCfDrghDFx0Jqlw # A
https://www.youtube.com/feeds/videos.xml?channel_id=UCIEIRz-KpYoEPnrNQuyHwJw # Scott & Frogpants https://www.youtube.com/channel/UCIEIRz-KpYoEPnrNQuyHwJw
https://www.youtube.com/feeds/videos.xml?channel_id=UC3XFhrQCuErV-XfuEJDjwAg # TimeLapseBuilding https://www.youtube.com/channel/UC3XFhrQCuErV-XfuEJDjwAg
https://www.youtube.com/feeds/videos.xml?channel_id=UCPsSoOCRNIj-eo2UbXfcdAw # xen-42 https://www.youtube.com/@xen-42
https://www.youtube.com/feeds/videos.xml?channel_id=UC-ufRLYrXxrIEApGn9VG5pQ # Reject Convenience https://www.youtube.com/@rejectconvenience