(Translated by https://www.hiragana.jp/)
[youtube:tab] Fallback to API when webpage fails to download by coletdjnz · Pull Request #1122 · yt-dlp/yt-dlp · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[youtube:tab] Fallback to API when webpage fails to download #1122

Merged
merged 63 commits into from
Oct 8, 2021

Conversation

coletdjnz
Copy link
Member

@coletdjnz coletdjnz commented Sep 29, 2021

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

This supersedes #682. (I'm making a new PR to keep things on topic - the other PR is a bit messy due to be experimental).


This PR adds support for falling back to the Innertube API when the webpage fails to download for whatever reason (429, network error, initial data extraction failure, etc.)

It is also possible to manually skip the webpage download by using --extractor-args youtubetab:skip=webpage

Also added a quick fix for the resolve MP to OLAK playlist handling, since that was broken.

At least at the time of writing, yt-dlp should now have a complete* workaround for the common** YouTube HTTP 429 error reported, for both video and tab extraction.
*except private content accessible only through a secondary or greater account/channel
**common referring to the webpage request getting 429nd only, but the API still works.

resolves #926, resolves ytdl-org/youtube-dl#23638 (and most of the other 429 reports upstream)

Known issues:

  • When using cookies, the webpage is required to extract the required variables. Without this, the API requests will be for the first channel of the first account.

    • In the case of the user passing cookies, and the webpage failing to download, the tab extractor will raise an error. This is to stop unwanted behavior in variety of situations (e.g. an automated script to download a secondary channel liked playlist, but the webpage fails and starts downloading the first channel liked playlist).

    • Currently, the user can skip this check by passing --extractor-args youtubetab:skip=authcheck

  • channel search does not work (returns 0 results)

TODO:

# Conflicts:
#	yt_dlp/extractor/youtube.py
@coletdjnz
Copy link
Member Author

coletdjnz commented Sep 30, 2021

Had to fix and improve the visitorData handling since the home/recommended was broken.

Some info on visitorData for the sake of documentation:

When YT gives visitorData in the responseContext of any API request, from what I see, is the most current one (so we need to update our visitorData to it). visitorData can change in some situations, such as ytdl-org/youtube-dl#28702.

If YouTube doesn't provide visitorData in responseContext, visitorData is the same as before (i.e., does not need updating).

We've seen it been used to keep track of what comment section version to use (ensure consistency) when YT was switching from the old to new API a while back. Along with the recommend feed and the playlists in the above issue, maybe this suggests it is some sort of token that identifies a session state?


After this PR I do need to go through YoutubeIE and clean up it's visitorData handling. I notice the comment one no longer works. Even though it may not be required as of now, it is good to have just in case (e.g A/B testing as mentioned before).

Note: probably not a good idea to share visitorData across different clients.

coletdjnz and others added 2 commits September 30, 2021 21:37
yt_dlp/extractor/youtube.py Show resolved Hide resolved
yt_dlp/extractor/youtube.py Show resolved Hide resolved
@coletdjnz
Copy link
Member Author

coletdjnz commented Oct 4, 2021

Some things we'll move to a future cleanup PR(s):

  • generalize extract_response and extract_webpage
  • fixup extract_response & extract webpage INFO alerts (i.e., unavailable videos):
    • show only if incomplete data
    • otherwise, show in debug mode
  • fixup tab extractor tests (needs above)
  • remove old python2 hacks from tab entries
  • ensure consistency with visitorData

@coletdjnz
Copy link
Member Author

coletdjnz commented Oct 4, 2021

Some changes:

  • added a few tests for API Fallback mode
  • raise an error when url is not resolved (no endpoint provided) instead of unexpected tab not found error. This usually happens when, for example, an invalid playlist link is provided. Normally on YouTube it redirects to the home page.

Issue: channel search (e.g. https://www.youtube.com/user/theCodyReeder/search?query=chicken):

  • resolve_url endpoint resolves this correctly, but the query parameter that would usually be in plaintext json is instead in the protobuf params. YouTube's backend doesn't understand itself in this situation so the API fallback does not work in this case. 😆

@coletdjnz coletdjnz added pending-review PR needs a review and removed pending-fixes PR has had changes requested labels Oct 5, 2021
@pukkandan pukkandan merged commit ac56cf3 into yt-dlp:master Oct 8, 2021
nixxo pushed a commit to nixxo/yt-dlp that referenced this pull request Nov 22, 2021
…1122)

and add some extractor_args to force this mode
Authored by: coletdjnz
@coletdjnz coletdjnz deleted the tab-api-fallback branch March 4, 2022 06:03
@pukkandan pukkandan added site-enhancement Feature request for some website and removed pending-review PR needs a review labels Jun 6, 2022
@vivandiu
Copy link

Ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-enhancement Feature request for some website
Projects
None yet
3 participants