Over time, I developed a certain google-fu and expertise in finding references, papers, and books online. Some of these tricks are not well-known, like checking the Internet Archive (IA) for books. I try to write down my search workflow, and give general advice about finding and hosting documents.

'Google-fu' or search skill is something I've prided myself ever since elementary school, when the librarian challenged the class to find things in the almanac; not infrequently, I'd win. The Internet is the greatest almanac of all, and to the curious, a never-ending cornucopia, so it makes me sad to see so many fail to find things---or not look at all.
Below, I've tried to provide, in a roughly chronological way, a flowchart of an online search.

Search {#search}

Preparation {#preparation}

The first thing you must do is develop a habit of searching when you have a question: "Google is your friend". Your only search guaranteed to fail is the one you never run. (
  1. Query syntax knowledge

Know your basic
Know your basic Boolean operators & the
& the key G search operators: double quotes for exact matches, hyphens for negation/exclusion, and
2. Hotkey acceleration (
热键加速( strongly recommended )

Enable some kind of hotkey search with both prompt and copy-paste selection buffer, to turn searching Google (G)/Google Scholar (GS)/Wikipedia (WP) into a reflex.
Enable some kind of hotkey search with both prompt and copy-paste selection buffer, to turn searching Google (G)/Google Scholar (GS)/Wikipedia (WP) into a reflex.
Example tools:
Example tools: AutoHotkey (Windows),
(Windows), Quicksilver (Mac),
(Mac), xclip+
's search-engines/
's Actions.Search/
/Prompt.Shell (Linux).
(Linux). DuckDuckGo offers 'bangs', within-engine special searches (most are equivalent to a kind of Google
I make heavy use of the XMonad hotkeys, which I wrote, and which gives me window manager shortcuts: while using any program, I can highlight a title string, and press
, paste with C-y, and edit it before a
, Chrome)
3. Web browser hotkeys
Web browser hotkeys

For navigating between sets of results and entries, you should have good command of your tabbed web browser. You should be able to go to the address bar, move left/right in tabs, close tabs, open new blank tabs, go to a specific tab, etc. (In Firefox, respectively:
, C-PgUp,
, C-PgDwn,
, C-w,
, C-t,
, M-[1-9].)

Searching {#searching}

Having launched your search in, presumably, GS, you must navigate the results.

In GS, remember that fulltext is not always denoted by a "[PDF]" link! Check the top hits by hand, there are often 'soft walls' which block web spiders but still let you download fulltext.

By Title {#by-title}

Title searches : if a paper fulltext doesn't turn up on the first page, start tweaking (hard rules cannot be given for this, it requires development of
  • Keep mind when searching, you want some but not too many or too few results. A few hundred hits in GS is around the sweet spot. If you have less than a page of hits, you have made your search too specific.
If deleting a few terms then yields way too many hits, try to filter out large classes of hits with a negation
  • Tweak the title: quote the title; delete any subtitle; if there are colons, split it into two title quotes (instead of searching
    , or "Foo bar: baz quux", search
    , search "Foo bar" "baz quux"); swap their order.

  • Add/remove the year.

  • Add/remove the first author.

  • Delete any unusual characters or punctuation. (Libgen had trouble with colons for a long time, and many websites still do.)
  • Use GS's date range to search ±4 years (metadata can be wrong, publishing conventions can be odd, publishers can be
  • Try alternate spellings of British/American terms. Try searching GS for just the author (

  • Add jargon which
    Add jargon which might be used by relevant papers; for example, if you are looking for an article on college admissions statistics, any such analysis would probably be using
If you don't know what jargon might be used, you may need to back off and look for a review article or textbook. Nothing is more frustrating that knowing there is a large literature on a topic ("Cowen's law") but being unable to find it because it's named something completely different than expected, and many fields have different names for the same concept or tool.

  • beware hastily dismissing 'bibliographic' websites:

While a site like
While a site like elibrary.ru is (almost) always useless & clutters up search results, every so often I run into a peculiar foreign website (often Indian or Chinese) which happens to have a scan of a book or paper. (eg
Hard Cases {#hard-cases}

If the basic tricks aren't giving any hints of working, you will have to get serious. The title may be completely wrong, or it may be indexed under a different author, or not directly indexed at all, or hidden inside a database. Here are some indirect approaches to finding articles:

  • Take a look in GS's "related articles" or "cited by" to find similar articles such as later versions of a paper which may be useful. (These are also good features to know about if you want to check things like "has this ever been replicated?" or are still figuring out the right jargon to search.)
  • Look for hints of hidden bibliographic connections. Does a paper pop up high in the search results which doesn't seem to make sense? GS generally penalizes items which exist as simply bibliographic entries, so if one is ranked high in a sea of fulltexts, that should make you wonder why it is being prioritized. Similarly for Google Books (GB): a book might be forbidden from even snippets but rank high; that might be for a good reason.
  • some papers can be found by searching for the volume or book title to find it indirectly, especially conference proceedings or anthologies; many papers
Conferences are particularly complex bibliographically, so you may need to apply the same tricks as for page titles: drop parts, don't fixate on the numbers, know that the authors or ISBN or ordering of "title:subtitle" can differ between sources, etc.

  • Another approach is to look up the listing for a journal issue, and find the paper by hand; sometimes papers are listed in the journal issue's online Table of Contents, but just don't appear in search engines. In particularly insidious cases, a paper may be digitized & available---but lumped in with another paper due to error, or only as part of a catch-all file which contains the last 20 miscellaneous pages of an issue. Page range citations are particularly helpful here because they show where the overlap is, so you can download the suspicious overlapping 'papers' to see what they
Esoteric as this may sound, this has been a problem on multiple occasions. (A particularly epic example was
  • master/PhD theses: sorry. It may be hopeless if it's pre-2000. You may well find the citation and even an abstract, but actual fulltext...? If you have a university proxy, you may be able to get a copy off
    。否则,您需要完整的大学ILL服务^3^{#fnref3}, and even that might not be enough (a surprising number of universities appear to restrict access only to the university students/faculty, with the complicating factor of most theses being stored on microfilm).

  • if images are involved, a reverse image search in Google Images or
  • domain knowledge:
    domain knowledge:

  • US federal court documents can be downloaded off
There is no equivalent for state or county court systems, which are balkanized and use a thousand different systems (often privatized \& charging far more than PACER); those must be handled on a case by case basis. (Interesting trivia point: according to Nick Bilton's account of the Silk Road 1 case, the FBI and other federal agencies in the SR1 investigation would deliberately steer cases into state rather than federal courts in order to hide them from the relative transparency of the PACER system. The use of multiple court systems can backfire on them, however, as in the case of SR2's DoctorClu (see   
  • for charity financial filings, do
    对于慈善财务文件,做Form 990 site:charity.com and then check GuideStar (eg
    然后检查GuideStar(例如"Case Study: Reading Edge's financial filings")

  • for anything related to education, do a site search of ERIC, which is similar to IA in that it will often have fulltext which is buried in the usual search results

By Quote or Description {#by-quote-or-description}

For quote/description searches: if you don't have a title and are falling back on searching quotes, try varying your search similarly to titles:

  • Try the easy search first.
  • Don't search too long a quote, a sentence or two is usually enough and can be helpful in turning up other sources quoting different chunks which may have better citations.
  • Try multiple sub-quotes from a big quote, especially from the beginning and end, which are likely to overlap with quotes which have prior or subsequent passages.
  • Look for passages in the original text which seem like they might be based on the same source, particularly if they are simply dropped in without any hint at sourcing; authors typically don't cite every time they draw on a source, usually only the first time, and during editing the 'first' appearance of a source could easily have been moved to later in the text. All of these additional uses are something to add to your searches.
  • You are fighting a game of Chinese whispers, so look for unique-sounding sentences and terms which can survive garbling in the repeated transmissions. Avoid phrases which could be easily reworded in multiple equivalent ways, as people usually will reword them when quoting from memory, screwing up literal searches.
  • Watch out for punctuation and spelling differences hiding hits.
  • Search for oddly-specific phrases or words, especially numbers. 3 or 4 keywords is usually enough.
  • Longer, less witty versions are usually closer to the original and a sign you are on the right trail.
  • Switch to GB and hope someone paraphrases or quotes it, and includes a real citation; if you can't see the full passage or the reference section, look up the book in Libgen.
Dealing With Paywalls {#dealing-with-paywalls}

A paywall can usually be bypassed by using Libgen (LG)/Sci-Hub (SH):
A paywall can usually be bypassed by using Libgen (LG)/Sci-Hub (SH):papers can be searched directly (ideally with the DOI, but title+author with no quotes will usually work), or an easier way may be to prepend
可以直接搜索(理想情况下使用DOI,但标题+作者没有引号通常会起作用),或者更简单的方法可能是前置sci-hub.tw (or whatever SH mirror you prefer) to the URL of a paywall.

If those don't work and you do not have a university proxy or alumni access, many university libraries have IP-based access rules and also open WiFi or Internet-capable computers with public logins inside the library, which can be used, if you are willing to take the time to visit a university in person, for using their databases (probably a good idea to keep a list of needed items before paying a visit).

If that doesn't work, there is a more opaque ecosystem of filesharing services: booksc/bookfi/bookzz, private torrent trackers like Bibliotik,
如果这不起作用,有一个更不透明的文件共享服务生态系统:booksc / bookfi / bookzz,像Bibliotik这样的私人洪流跟踪器,IRC channels with
channels with XDCC bots like
bots like #bookz/
/#ebooks, old P2P networks like
,老P2P网络就好eMule, private
, private DC++ hubs...

Site-specific notes:

  • Elsevier/
    Elsevier/sciencedirect.com: easy, always available via SH/LG
Note that many Elsevier journal websites do not work with the SH proxy, although their
  • PsycNET: one of the worst sites; SH/LG never work with the URL method, rarely work with paper titles/DOIs, and with my university library proxy, combined searches don't usually work (frequently failing to pull up even bibliographic entries), and only DOI or manual title searches in the EBSCOhost database have a chance of fulltext. (EBSCOhost itself is a fragile search engine which is difficult to query reliably in the absence of a DOI.) Try to find the paper anywhere else besides PsycNET!
Request {#request}

Last resort: if none of this works, there are a few places online you can request a copy (however, they will usually fail if you have exhausted all previous avenues):

Finally, you can always try to contact the author. This only occasionally works for the papers I have the hardest time with, since they tend to be old ones where the author is dead or unreachable---any author publishing a paper since 1990 will usually have been digitized
Post-finding {#post-finding}

After finding a fulltext copy, you should find a reliable long-term link/place to store it and make it more findable:

  • never link LG/SH:
    never link LG/SH:

Always operate under the assumption they could be gone tomorrow. (As indeed my uncle found out with Library.nu shortly after paying for a lifetime membership.) There are no guarantees either one will be around for long under their legal assaults, and no guarantee that they are being properly mirrored or will be restored elsewhere. Download anything you need and keep a copy of it yourself and, ideally, host it publicly.
  • never rely on a
    never rely on a papers.nber.org/tmp/ or
    or psycnet.apa.org URL, as they are temporary

  • never link Scribd: they are a scummy website which impede downloads, and anything on Scribd usually first appeared elsewhere anyway.

  • avoid linking to
    avoid linking to ResearchGate (compromised by investment & PDFs get deleted routinely, apparently often by authors) or
    (投资和PDF文件的妥协会被定期删除,显然通常由作者删除)或Academia.edu (the URLs are one-time and break)

  • be careful linking to Nature.com (if a paper is not
    小心链接到Nature.com(如果纸张没有 explicitly marked as Open Access, even if it's available, it may disappear in a few months!); similarly, watch out for
    , tandfonline.com,
    , jstor.org,
    , springer.com,
    , springerlink.com, &
    , & mendeley.com

  • be careful linking to academic personal directories on university websites (often noticeable by the Unix convention
  • check & improve metadata.

Adding metadata to papers/books is a good idea because it makes the file findable in G/GS (if it's not online, does it really exist?) and helps you if you decide to use bibliographic software like
: : exiftool -All prints all metadata, and the metadata can be set individually using similar fields.

For papers hidden inside volumes or other files, you should extract the relevant page range to create a single relevant file. (For extraction of PDF page-ranges, I use
, eg: pdftk 2010-davidson-wellplayed10-videogamesvaluemeaning.pdf cat 180-196 output 2009-fortugno.pdf.)

I try to set at least title/author/DOI/year/subject, and stuff any additional topics & bibliographic information into the "Keywords" field. Example of setting metadata:
  exiftool -Author="Frank P. Ramsey" -Date=1930 -Title="On a Problem of Formal Logic" -DOI="10.1112/plms/s2-30.1.264" \\{#cb1-1}
      -Subject="mathematics" -Keywords="Ramsey theory, Ramsey's theorem, combinatorics, mathematical logic, decidability, \{#cb1-2}
      first-order logic,  Bernays-Schönfinkel-Ramsey class of first-order logic, _Proceedings of the London Mathematical \{#cb1-3}
      Society_, Volume s2-30, Issue 1, 1 January 1930, pg264-286" 1930-ramsey.pdf
  • if a scan, it may be worth editing the PDF to crop the edges, threshold to binarize it (which, for a bad grayscale or color scan, can drastically reduce filesize while increasing readability), and OCRing it. I use
  • if possible, host a public copy; especially if it was very difficult to find, even if it was useless, it should be hosted. The life you save may be your own.

  • for bonus points, link it in appropriate places on Wikipedia

Advanced {#advanced}

Aside from the highly-recommended use of hotkeys and Booleans for searches, there are a few useful tools for the researcher, which while expensive initially, can pay off in the long-term:

  • archiver-bot: automatically archive your web browsing and/or links from arbitrary websites to forestall linkrot; particularly useful for detecting & recovering from dead PDF links

  • PubMed & GS search alerts: set up alerts for a specific search query, or for new citations of a specific paper. (
  1. PubMed has straightforward conversion of search queries into alerts: "Create alert" below the search bar. (Given the volume of PubMed indexing, I recommend carefully tailoring your search to be as narrow as possible, or else your alerts may overwhelm you.)
  2. To create generic GS search query alert, simply use the "Create alert" on the sidebar for any search. To follow citations of a key paper, you must: 1. bring up the paper in GS; 2. click on "Cited by X"; 3. then use "Create alert" on the sidebar.
  Google Custom Search Engines (a GCSE is a specialized search queries limited to whitelisted pages/domains etc; eg my
    。如果您发现自己经常在搜索中包含许多域名,或者将域名列入黑名单-site: or using many negations to filter out common false positives, it may be time to set up a GCSE.)

  • Clipping/note-taking services like
    like Evernote/
    /Microsoft OneNote: regularly making and keeping excerpts creates a personalized search engine, in effect.

This can be vital for refinding old things you read where the search terms are hopelessly generic or you can't remember an
Useful tools to know about:
, cURL,
, HTTrack; Firefox plugins:
; Firefox plugins: NoScript,
, uBlock origin,
, Live HTTP Headers,
, Bypass Paywalls, cookie exporting. Short of downloading a website, it might also be useful to pre-emptively archive it by using
,cookie导出。如果没有下载网站,通过使用预先存档它也可能是有用的linkchecker to crawl it, compile a list of all external & internal links, and store them for processing by another archival program (see
抓取它,编译所有外部和内部链接的列表,并存储它们以供另一个归档程序处理(参见Archiving URLs for examples).
for examples).

With proper use of pre-emptive archiving tools like
正确使用先发制人的归档工具archiver-bot, fixing linkrot in one's own pages is much easier, but that leaves other references. Searching for lost web pages is similar to searching for papers:

  • if the page title is given, search for the title.

It is a good idea to include page titles in one's own pages, as well as the URL, to help with future searches, since the URL may be meaningless gibberish on its own, and pre-emptive archiving can fail. HTML supports both
链接标记中的参数,以及在不希望显示标题的情况下(因为链接在内联中作为正常的超文本书写的一部分使用),标题可以在Markdown文档中干净地包含在这样的:[inline text description](URL "Title").

  • check the URL: is it weird or filling with trailing garbage like
  • restrict G search to the original domain with
    使用限制G搜索到原始域site:, or to related domains

  • restrict G search to the original date-range/years

  • try a different search engine: corpuses can vary, and in some cases G tries to be too smart for its own good when you need a literal search;
    and Bing are usable alternatives (especially if one of DuckDuckGo's 'bang' special searches is what one needs)

  • if nowhere on the clearnet, try the Internet Archive (IA) or the
IA is the default backup for a dead URL. If IA doesn't Just Work, there may be other versions in it:

  • did the IA 'redirect' you to an error page? Kill the redirect and check the earliest stored version. Did the page initially load but then error out/redirect? Disable JS with NoScript and reload.

  • IA lets you list all URLs with any archived versions, by searching for
    IA允许您通过搜索列出包含任何存档版本的所有URLURL/*; the list of available URLs may reveal an alternate newer/older URL. It can also be useful to filter by filetype or substring. For example, one might list all URLs in a domain, and if the list is too long and filled with garbage URLs, then using the "Filter results" incremental-search widget to search for "uploads/" on a WordPress blog.
    ;可用URL列表可能会显示备用较新/较旧的URL。按文件类型或子字符串过滤也很有用。例如,可以列出域中的所有URL,如果列表太长并且填充了垃圾URL,则使用"过滤结果"增量搜索小组件在WordPress博客上搜索"uploads /"。^4^{#fnref4}

![Screenshot of an oft-overlooked feature of the Internet Archive: displaying all available/archived URLs for a specific domain, filtered down to a subset matching a string like uploads/.](https://www.gwern.net/images/2019-internetarchive-domainsearch-screenshot.png) Screenshot of an oft-overlooked feature of the Internet Archive: displaying all available/archived URLs for a specific domain, filtered down to a subset matching a string like   
* [`wayback_machine_downloader`](https://github.com/hartator/wayback-machine-downloader) (not to be confused with the [`internetarchive` Python package](https://github.com/jjjake/internetarchive) which provides a CLI interface to uploading files) is a Ruby tool which lets you download whole domains from IA, which can be useful for running a local fulltext search using regexps (a good `grep` query is often enough), in cases where just looking at the URLs via `URL/*` is not helpful. (An alternative which might work is [websitedownloader.io](https://websitedownloader.io "Wayback Machine Downloader: Download the source code and assets from Wayback Machine").)

  • did the domain change, eg from
  • is this a Blogspot blog? Blogspot is uniquely horrible in that it has versions of each blog for every country domain: a
  • did the website provide RSS feeds?

A little known fact is that   
[downloaded](https://www.archiveteam.org/index.php/Google_Reader) a large fraction of GR's historical RSS feeds, and   
  • archive.today: an IA-like mirror
    : an IA-like mirror

  • any local archives, such as those made with my

  • Google Cache (GC): GC works, sometimes, but the copies are usually the worst around, ephemeral & cannot be relied upon. Google also appears to have been steadily deprecating GC over the years, as GC shows up less & less in search results.

Digital {#digital}

E-books are rarer and harder to get than papers, although the situation has improved vastly since the early 2000s. To search for books online:

  • book searches tend to be faster and simpler than paper searches, and to require less cleverness in search query formulation. Typically, if the main title + author doesn't turn it up, it's not online. (In some cases, the author order is reversed, or the title:subtitle are reversed, and you can find a copy by tweaking your search, but these are rare.)
  • search G for title (book fulltexts usually don't show up in GS); to double-check, you can try a
  • then check LG
  • the
    the Internet Archive (IA/
    (IA/archive.org) has many books scanned which do not appear easily in search results.

  • If an IA hit pops up in a search, always check it; the OCR may offer hints as to where to find it. If you don't find anything or the provided, try doing an IA site search in G ( not the IA built-in search engine), eg book title site:archive.org.

  • Google Play: use the same PDF DRM as IA, can be broken same way
  • HathiTrust also hosts many book scans, which can be searched for clues or hints or jailbroken.

HathiTrust blocks whole-book downloads but it's easy to download each page in a loop and stitch them together, for example:

Another example of this would be the Wellcome Library; while looking for
  1. https://dlcs.io/iiif-img/wellcome/1/5c27d7de-6d55-473c-b3b2-6c74ac7a04c6/full/2212,/0/default.jpg
  2. https://dlcs.io/iiif-img/wellcome/1/d514271c-b290-4ae8-bed7-fd30fb14d59e/full/2212,/0/default.jpg
  3. etc

Instead of being sequentially numbered 1--90 or whatever, they all live under a unique hash or ID. Fortunately, one of the metadata files, the 'manifest' file, provides all of the hashes/IDs (but not the high-quality download URLs). Extracting the IDs from the manifest can be done with some quick
loop for download
loop for download

And then the 59MB of JPGs can be cleaned up as usual with
  • ebook.farm is a Kindle pirate website which takes Amazon gift-cards as currency; it has many recent e-books which are DRM-free and can be uploaded to LG.

  • remember the
    remember the analog hole works for papers/books too:

if you can find a copy to
Physical {#physical}

Books are something of a double-edged sword compared to papers/theses. On the one hand, books are much more often unavailable online, and must be bought offline, but at least you almost always
离线购买旧书,没有太多麻烦(通常总共&lt;10美元);另一方面,虽然论文/论文经常在网上提供,但当一篇论文不可用时,通常就是这样 very unavailable, and you're stuck (unless you have a university ILL department backing you up or are willing to travel to the few or only universities with paper or microfilm copies).

Purchasing from used book sellers:

  • Google Books is a good starting point for seller links; if buying from a marketplace like AbeBooks/Amazon/Barnes & Noble, it's worth searching the seller to see if they have their own website, which is potentially much cheaper. They may also have multiple editions in stock.
    Google图书是卖家链接的良好起点;如果从像AbeBooks / Amazon / Barnes&Noble这样的市场购买,那么值得搜索卖家,看看他们是否有自己的网站,这可能要便宜得多。它们也可能有多个版本的库存。

  • Sellers:

  • bad: eBay & Amazon, due to high-minimum-order+S&H, but can be useful in providing metadata like page count or ISBN or variations on the title
    坏:eBay和亚马逊,由于高最低订单+ S&H,但可用于提供元数据,如页数或ISBN或标题的变体

  • good:
    good: AbeBooks,
    , Thrift Books,
    , Better World Books,
    , B&N,
    , Discover Books.

Note: on AbeBooks, international orders can be useful (especially for behavioral genetics or psychology books) but be careful of international orders with your credit card---many debit/credit cards will fail and trigger a fraud alert, and PayPal is not accepted.  
  • if a book is not available or too expensive, set price watches: AbeBooks supports email alerts on stored searches, and Amazon can be monitored via
    (记住你想要的CCC价格提醒是在 used third-party category, as new books are more expensive, less available, and unnecessary).


  • destructive vs non-destructive: destructively debinding books with a razor or guillotine cutter works much better & is much less time-consuming than spreading them on a flatbed scanner to scan one-by-one
    {#fnref7},因为它允许使用单张纸扫描仪,它可以轻松快5倍并提供更高质量的扫描(因为纸张将是平的,边对边扫描,并且更紧密地对齐) 。

  • Tools:

  • For simple debinding of a few books a year, an X-acto knife/razor is good (avoid the 'triangle' blades, get curved blades intended for large cuts instead of detail work)

  • once you start doing more than one a month, it's time to upgrade to a guillotine blade paper cutter (a fancier swinging-arm paper cutter, which uses a two-joint system to clamp down and cut uniformly).

A guillotine blade can cut chunks of 200 pages easily without much slippage, so for books with more pages, I use both: an X-acto to cut along the spine and turn it into several 200-page chunks for the guillotine cutter.  
  • at some point, it may make sense to switch to a scanning service like
    /ocrmypdf and will save a
    and will save a lot of money as they, amazingly, bill by 100-page units). Books can be sent directly to 1DS, reducing logistical hassles.

  • after scanning, crop/threshold/OCR/add metadata
  • Adding metadata: same principles as papers. While more elaborate metadata can be added, like bookmarks, I have not experimented with those yet.

  • Saving files:
    Saving files:

In the past, I used
我生产的文件,因为它产生的扫描比gscan2pdf的默认PDF设置小得多due to a buggy Perl library (at least half the size, sometimes one-tenth the size), making them more easily hosted & a superior browsing experience.

The downsides of DjVu are that not all PDF viewers can handle DjVu files, and it appears that G/GS ignore all DjVu files (despite the format being 20 years old), rendering them completely unfindable online. In addition, DjVu is an increasingly obscure format and has, for example, been dropped by the IA as of 2016. The former is a relatively small issue, but the latter is fatal---being consigned to oblivion by search engines largely defeats the point of scanning! ("If it's not in Google, it doesn't exist.") Hence, despite being a worse format, I now recommend PDF and have stopped using DjVu for new scans
  • Uploading: to LibGen, usually. For backups, filelockers like Dropbox, Mega, MediaFire, or Google Drive are good. I usually upload 3 copies including LG. I rotate accounts once a year, to avoid putting too many files into a single account.
    上传:通常是LibGen。对于备份,Dropbox,Mega,MediaFire或Google Drive等文件锁定工具都很不错。我通常上传3份,包括LG。我每年轮换一次帐户,以避免将太多文件放入一个帐户。

  • Hosting: hosting papers is easy but books come with risk:

Books can be dangerous; in deciding whether to host a book, my rule of thumb is host only books pre-2000 and which do not have Kindle editions or other signs of active exploitation and is effectively an '
书籍可能是危险的;在决定是否主持一本书时,我的经验法则是仅限于2000年以前的主题书籍,并且没有Kindle版本或其他积极利用的迹象,实际上是&#39;orphan work'.

As of 11 December 2018, hosting 3763 files over 8 years (very roughly, assuming linear growth, <5.5 million document-days of hosting: 3763⋅0.5⋅8⋅365.25=5497743), I've received 3 takedown orders: a behavioral genetics textbook (2013),
截至2018年12月11日,在8年内托管3763个文件(非常粗略,假设线性增长,&lt;550万个文档日托管:3763⋅0.5⋅8⋅365.25= 5497743),我收到了3个删除命令:行为遗传学教科书(2013), The Handbook of Psychopathy (2005), and a recent meta-analysis paper (Roberts et al 2016). I broke my rule of thumb to host the 2 books (my mistake), which leaves only the 1 paper, which I think was a fluke. So, as long as one avoids relatively recent books, the risk should be minimal.

Searching the Google Reader archives {#searching-the-google-reader-archives}

One way to 'undelete' a blog or website is to use Google Reader (GR).

GR crawled regularly almost all blogs' RSS feeds; RSS feeds often contain the fulltext of articles. If a blog author writes an article, the fulltext is included in the RSS feed, GR downloads it, and then the author changes their mind and edits or deletes
However, before it was closed,
但是,在它关闭之前,Archive Team launched a major effort to download as much of GR as possible. So in that dump, there may be archives of all of a random blog's posts. Specifically: if a GR user subscribed to it; if Archive Team knew about it; if they requested it in time before closure; and if GR did keep full archives stretching back to the first posting.

Downside: the Archive Team dump is
Results {#results}

My dd extraction was successful, and the resulting HTML/RSS could then be browsed with a command like
