Should libraries run search engines? It seems that the original point of a library was to organize human knowledge and culture for the public benefit. The benefit has been great, but libraries aren't the main tool for finding information now. Search engines are. The notion that a library needs to be for books only is an arbitrary limitation. This oversight allowed corporations to move into that traditionally non-profit role, and I'm not convinced they've done a particularly good job.
There did used to be more overlap between information search and information access/management in the past.
So I don't think you are having a controversial idea here.
@onepict This post was inspired mostly by a take I've seen on the internet several times that goes something like "Libraries would be considered crazy if proposed today, so what good stuff are we missing out on because we're not already doing it?" And to me this felt like an example of even a thing that libraries specifically could be doing that we're missing out on. So yeah, the non-controversialness is sort-of a feature!
I do wonder whats happening though, at uni as much research in compsci for information search came from the libraries. Like the original algorithms for search for information management, and for stuff like working out to do OCR but for handwriting. Which wasn't that successful as projects like transcribe Bentham relies on crowd working. Just how did the disciplines get so separated?
My graduate project was trying do do OCR on existing documents that had just been scanned in. In the early 2000s. That was fun, like there were no real java implementations of it and the libraries for OCR were proprietary and a lot of money. But I still have some of the papers somewhere on some of the handwriting OCR early research somewhere.
TL;DR, this is largely already happening, it's just targeting things other than websites as primary sources.
I'm not sure websites are that good a target, either. They're often pretty rubbish.
Interesting dilemma, actually. What value is there in making rubbish but accessibly secondary/tertiary sources discoverable?
I recall that Back In The Day™️ whenever you ran into HTML it would be explained in terms of SGML (fair), and that invariably led to mention of Dublin Core. Also e.g. LaTeX at the time seemed to mention DC often.
At the latest since the ill-fated XHTML attempt, DC dropped off the radar.
HOWEVER, my librarian friend was utterly unsurprised by it, and more surprised that I as a pure compsci person knew what it was about.
@onepict @distractedmosfet It probably helps that one of my other friends, Dan Brickley, runs https://schema.org ... there's no direct connection to DC, but indirectly both draw on RDF historically, and RDF is sort of where the web world and DC formalized that it's about data and not HTML so much. There was a lot of parallel and cross-fertilising stuff going on there since the 90s.
I've basically been in constant contact with people concerned with formalizing how to describe resources.
Where websites are mostly different is that they tend to be ad-hoc, informal sources of information that are much harder to even describe formally because it's not necessarily clear what these things *are*. Is a blog post by a doctor a medical resource or an opinion? Is it both?
Web search engines are...
@onepict @distractedmosfet I'm also very interested in this kind of thing from my #interpeer point of view. It's abundantly clear that computers do better with categories provided by schemata, but the web and search engines also demonstrate clearly that most people don't care.
Schema.org is interesting to me because it's specifically aimed at bridging that gap: it provides schema keywords with which you can e.g. decorate your website content such that it looks more structured to crawlers and...
@onepict @distractedmosfet ... therefore becomes more of a well-defined thing for search engines. But most of that is going to happen outside of the user's view who is just writing a blog post or some such.
(It's no surprise that Dan runs the project while being a Google employee; Google benefits from websites looking more structured to their crawler, of course.)
@onepict @distractedmosfet So this is a little bit of a rambling comment thread; the main point being, I think, is that there is already a bunch of tech in libraries that would provide for search engines
A secondary point is that the main difference is how libraries and search engines look at different resources and why, and a third is that it's somewhat possible to bridge this.
As to whether it'd be a good idea for libraries to run search engines, well, I don't know. Yes and no?
A private instance for the Finkhäuser family.