Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bizarre duplication of metadata on different vods #1263

Open
2 tasks done
EthanZeigler opened this issue Dec 8, 2024 · 34 comments
Open
2 tasks done

Bizarre duplication of metadata on different vods #1263

EthanZeigler opened this issue Dec 8, 2024 · 34 comments
Labels
bug Something isn't working

Comments

@EthanZeigler
Copy link

Checklist

Edition

Command Line Interface

Describe your issue here

I'm not sure how to describe this other than just showing.

I have been using this tool to download vods from a friend of mine for months and inserting everything into a searchable database. That isn't important for the bug but will make explaining the circumstances easier. I was using the latest version at the time of the bugs. The issues occurred in september and october.

Over this time period i've accumulated over 200 vod chats and while implementing file hashing to not reprocess data, I ran into this: 2 vod downloads where the metadata appears to have been merged and mangled in a way that makes absolutely no sense. For the record, these are different vods. They did not occur on the same day at the same time. They should be completely distinct. And yet, they're not?

I'm aware this is likely a bug on twitch's end, but its so unusual I felt i needed to put it here in case anyone else has run into the same behavior. The affected vods are in september and october of 2024. For privacy reasons I've redacted the name and ID of the streamer but can provide them privately if requested. Or you can just look up the vod id if it's floating around, but i wanted to avoid search engines indexing the name.

Does anyone know what on earth happened here???

{
  "FileInfo": {
    "Version": {
      "Major": 1,
      "Minor": 3,
      "Patch": 1
    },
    "CreatedAt": "2024-09-16T23:21:16.9694971-04:00",
    "UpdatedAt": "0001-01-01T00:00:00"
  },
  "streamer": {
    "name": "<redacted>",
    "id": 00000
  },
  "video": {
    "title": "THE CRITTER IS: returning to my barista roots !LURK !DISCORD",
    "description": null,
    "id": "2252936436",
    "created_at": "2024-09-16T22:57:45Z",
    "start": 0,
    "end": 12966,
    "length": 12966,
    "viewCount": 20,
    "game": "The Closing Shift",
    "chapters": [
      {
        "id": "05f0636da035b42ca2411314878bde67",
        "startMilliseconds": 0,
        "lengthMilliseconds": 80000,
        "type": "GAME_CHANGE",
        "description": "The Closing Shift",
        "subDescription": "",
        "thumbnailUrl": "",
        "gameId": "1419063656",
        "gameDisplayName": "The Closing Shift",
        "gameBoxArtUrl": "https://static-cdn.jtvnw.net/ttv-boxart/1419063656_IGDB-40x53.jpg"
      },
      {
        "id": "1405752318707a2540edb9e88a1a54f5",
        "startMilliseconds": 80000,
        "lengthMilliseconds": 12886000,
        "type": "GAME_CHANGE",
        "description": "Art",
        "subDescription": "",
        "thumbnailUrl": "",
        "gameId": "509660",
        "gameDisplayName": "Art",
        "gameBoxArtUrl": "https://static-cdn.jtvnw.net/ttv-boxart/509660-40x53.jpg"
      }
    ]
  }
}
{
  "FileInfo": {
    "Version": {
      "Major": 1,
      "Minor": 3,
      "Patch": 1
    },
    "CreatedAt": "2024-09-18T00:57:02.0421814-04:00",
    "UpdatedAt": "0001-01-01T00:00:00"
  },
  "streamer": {
    "name": "<redacted>",
    "id": 00000
  },
  "video": {
    "title": "THE CRITTER IS: chilllll drawinggg !LURK !DISCORD",
    "description": null,
    "id": "2252936436",
    "created_at": "2024-09-16T22:57:45Z",
    "start": 0,
    "end": 12966,
    "length": 12966,
    "viewCount": 1681,
    "game": "Art",
    "chapters": [
      {
        "id": "05f0636da035b42ca2411314878bde67",
        "startMilliseconds": 0,
        "lengthMilliseconds": 80000,
        "type": "GAME_CHANGE",
        "description": "Art",
        "subDescription": "",
        "thumbnailUrl": "",
        "gameId": "509660",
        "gameDisplayName": "Art",
        "gameBoxArtUrl": "https://static-cdn.jtvnw.net/ttv-boxart/509660-40x53.jpg"
      },
      {
        "id": "1405752318707a2540edb9e88a1a54f5",
        "startMilliseconds": 80000,
        "lengthMilliseconds": 12886000,
        "type": "GAME_CHANGE",
        "description": "Art",
        "subDescription": "",
        "thumbnailUrl": "",
        "gameId": "509660",
        "gameDisplayName": "Art",
        "gameBoxArtUrl": "https://static-cdn.jtvnw.net/ttv-boxart/509660-40x53.jpg"
      }
    ]
  }
}

Add any related files or extra information here

No response

@EthanZeigler EthanZeigler added the bug Something isn't working label Dec 8, 2024
@ScrubN
Copy link
Collaborator

ScrubN commented Dec 8, 2024

Unfortunately because the VODs are expired I cannot do any debugging, however there are several details (such as the chapter ids) that suggest the issue was probably not created by TwitchDownloader.

@ScrubN
Copy link
Collaborator

ScrubN commented Dec 8, 2024

For now, you will just need to manually fix it, however if you notice it again while the VOD is still available, let me know and I'll look into it.

@EthanZeigler
Copy link
Author

Will do. Agreed, I dont think this is the downloader. It looks like twitch's API having a stroke in a completely nonsensical manner. The good news is the way ive set up my systems now, i'll be alerted as soon as a response like this is detected. If/when it happens, i'll update this issue asap.

@ScrubN
Copy link
Collaborator

ScrubN commented Dec 8, 2024

It wouldn't be the first time Twitch's API delivered wrong or incomplete data. If this becomes a reoccurring issue, we will likely need to either change how we fetch VOD chapters or add logic to combine the extra chapters.

Checking TwitchTracker, it doesn't seem to have the same issues, but it might also use one of the other API calls for fetching VOD chapters.

@EthanZeigler
Copy link
Author

EthanZeigler commented Dec 8, 2024

It's not just the chapters. The vod ID is also nonsense. That's really what i ran into first and noticed the chapter problem on top of it. I'm using the vod id as a database pk and got a file hash mismatch warning when importing since it thought these were the same vod, but clearly they're not.

@ScrubN
Copy link
Collaborator

ScrubN commented Dec 8, 2024

Oh, I completely missed that. I genuinely have no clue how that could have happened as the ID stored in the JSON file is the exact same object as is used to fetch video info. The ID object is also never mutated, meaning this makes absolutely zero sense.

My only guess would have to be that at some point the file was mutated by another program. Either that or Twitch reused video this specific ID for the exact same streamer, which seems unlikely.

@EthanZeigler
Copy link
Author

EthanZeigler commented Dec 9, 2024

That's why I'm suspicious of the twitch API. I'm using a modified version of https://github.com/cr08/TwitchVault to manage the data. It pulls a list of vods on the configured user and downloads them if they dont already exist. This seems to take into account the name associated to the vod as well, which technically could lead to an issue like this if the vod was to be renamed.

However, that's not what I'm seeing here because the data overlap is so bizarre. Clearly the API saw a vod with the same id as another at some point and downloaded the data associated with the id. That download had a different name and therefore wasn't seen automatically as a dupe by the wrapper. It invokes the twtichdownloader cli using the video ID as the source and sets the output to the vod id concatenated with the title as reported by twitch. In short, the twitch api did in fact at some point report a vod with this kind of bizarre data. The wrapper code and twitchdownloader hit the api independently and came to the same conclusion. I didn't link it here because it's not from this package, but the metadata reported by that wrapper agrees with what was in the chat download and the audio file it downloaded at the time is completely different.

TLDR i'm pretty confident twitch's api had a stroke. At least twice.

ls -alh output on the relevant files. The timestamp on archive_chat.json is the time the cli got invoked. The other metadata and audio files can get a little weird so i omitted them because i'm lazy about the data processing and import sometimes.

-rw-r--r-- 1 ethan ethan 3.0M Oct  3 21:47 '20241002 T225528Z - 2266136455 - THE CRITTER IS HEARTGOLD ARTLOCKE LURK DISCORD_archive_chat.json'
-rw-r--r-- 1 ethan ethan 3.0M Oct  3 00:01 '20241002 T225528Z - 2266136455 - THE CRITTER IS MONSTER DESIGN CONTEST LURK DISCORD_archive_chat.json'
-rw-r--r-- 1 ethan ethan 3.9M Oct 23 22:16 '20241016 T225454Z - 2277847598 - THE CRITTER IS BACK TO NUZLOCKING GRIND LURK DISCORD_archive_chat.json'
-rw-r--r-- 1 ethan ethan 3.9M Oct 18 00:09 '20241016 T225454Z - 2277847598 - THE CRITTER IS ON THE COMMISSION GRIND LURK DISCORD_archive_chat.json'

Both instances here exhibit the same behavior with the chapters, start time, end time, length, link, id, etc but the chats are correct somehow.

Edit: I went through my data and can also confirm this returned the chat data is for the wrong vod too. Twitch what the hell is going on over there?

For reference, this is my importer tool output. It noticed the issue in the data. Implementing the hash was in case i manually had to update data. Never thought it would end up catching this.

https://gist.github.com/EthanZeigler/3d84da575c4ea3a91989beb2d3c61118

@superbonaci
Copy link
Contributor

superbonaci commented Dec 9, 2024

After checking I think what may be happening.

The stream started at 16 September 2024 around 23:00.
5 minutes lates the game changed, but the title was also changed.

So what can be happening is the following:

Instead of having the metadata.txt like TwitchDownloaderCLI does, and choose only one of the titles for the main title, you split the metadata in two and converted to json, so there's one chapter on each one and with different title. But titles and games may not change at the same time. Both json are part of the same stream and have the same id.

The metadata.txt could include the game and the title in each chapter, but there's also the possibility that games and titles do not change at the same time so it will be more complex.

you can see the stream info here: https://streamscharts.com/channels/fourleafisland/streams/52006587565

the duration of each chapter is wrong in your json.

The metadata.txt once reconstructed will be something like:

;FFMETADATA1
title=THE CRITTER IS: returning to my barista roots !LURK !DISCORD (2252936436)
artist=fourleafisland
date=2024
comment=Created at: 2024-09-16 22:57:45Z\
Video id: 2252936436\
Views: 1681
[CHAPTER]
TIMEBASE=1/1000
START=0
END=300000
title=Game: The Closing Shift | Chapter: THE CRITTER IS: returning to my barista roots !LURK !DISCORD
[CHAPTER]
TIMEBASE=1/1000
START=300000
END=13200000
title=Game: Art | Chapter: THE CRITTER IS: chilllll drawinggg !LURK !DISCORD

the game and title inside chapters have to be put in same place because there isn't other tags. Same for gameid and all the other if you want to add them but the title is too long.

the title change happens a lot when the guy starts a new stream but keeps the title from the previous one and updates it.

@ScrubN
Copy link
Collaborator

ScrubN commented Dec 9, 2024

Streamscharts encountered the same issue with the Twitch API. TwitchTracker correctly shows the they were 2 different streams; one on Sept. 13th and the other on Sept. 16th.

@EthanZeigler
Copy link
Author

After checking I think what may be happening.

The stream started at 16 September 2024 around 23:00.
5 minutes lates the game changed, but the title was also changed.

So what can be happening is the following:

Instead of having the metadata.txt like TwitchDownloaderCLI does, and choose only one of the titles for the main title, you split the metadata in two and converted to json, so there's one chapter on each one and with different title. But titles and games may not change at the same time. Both json are part of the same stream and have the same id.

The metadata.txt could include the game and the title in each chapter, but there's also the possibility that games and titles do not change at the same time so it will be more complex.

you can see the stream info here: https://streamscharts.com/channels/fourleafisland/streams/52006587565

the duration of each chapter is wrong in your json.

The metadata.txt once reconstructed will be something like:

;FFMETADATA1
title=THE CRITTER IS: returning to my barista roots !LURK !DISCORD (2252936436)
artist=fourleafisland
date=2024
comment=Created at: 2024-09-16 22:57:45Z\
Video id: 2252936436\
Views: 1681
[CHAPTER]
TIMEBASE=1/1000
START=0
END=300000
title=Game: The Closing Shift | Chapter: THE CRITTER IS: returning to my barista roots !LURK !DISCORD
[CHAPTER]
TIMEBASE=1/1000
START=300000
END=13200000
title=Game: Art | Chapter: THE CRITTER IS: chilllll drawinggg !LURK !DISCORD

the game and title inside chapters have to be put in same place because there isn't other tags. Same for gameid and all the other if you want to add them but the title is too long.

the title change happens a lot when the guy starts a new stream but keeps the title from the previous one and updates it.

Appreciate the effort! I think you've misunderstood the problem a bit though. This isn't a case of UTC rollover. That was one of the first things I checked. In fact, the api in question returns data in local time where the stream did not cross the date barrier. This is the twitch api returning bogus data for a stream id.

@EthanZeigler
Copy link
Author

EthanZeigler commented Dec 9, 2024

Streamscharts encountered the same issue with the Twitch API. TwitchTracker correctly shows the they were 2 different streams; one on Sept. 13th and the other on Sept. 16th.

I guess my next question is why twitch tracker has this correct? I'm trying to build a self-hosted product that does similar things as these products but with more depth

@superbonaci
Copy link
Contributor

superbonaci commented Dec 9, 2024

Streamscharts encountered the same issue with the Twitch API. TwitchTracker correctly shows the they were 2 different streams; one on Sept. 13th and the other on Sept. 16th.

The stream on Sept. 13th is not related to this issue, it's a different stream, it happens that has the same title as one of the chapters of the other, but it's another stream: https://twitchtracker.com/fourleafisland/streams/44810626811

The issue you are seeing is that streams with UTC rollover from 16th to 17th are displayed in both days in the table in twitchtracker, but once you enter the details it's placed in either of them not in both days.

For example twitchtracker:

1
2

Monday 16th is missing:

3
4

the stream will be placed on 17th sept. https://twitchtracker.com/fourleafisland/streams/52006587565
does not exist a stream placed on 16th with a id

Streamcharts places it on 16th, same id 52006587565: https://streamscharts.com/channels/fourleafisland/streams/52006587565

5

one places it the day when starts and other when ends but the only important time is the "created_at"

@superbonaci
Copy link
Contributor

The id 2252936436 is the same as 52006587565 because this one is for the vod and the other for the live stream. There's a Reddit post about it.

@superbonaci
Copy link
Contributor

Appreciate the effort! I think you've misunderstood the problem a bit though. This isn't a case of UTC rollover. That was one of the first things I checked. In fact, the api in question returns data in local time where the stream did not cross the date barrier. This is the twitch api returning bogus data for a stream id.

I don't understand exactly which is the bogus data, could you tell please?

@superbonaci
Copy link
Contributor

I think you are not understanding it correctly, in both cases this is the same date and id:

 "id": "2252936436",
"created_at": "2024-09-16T22:57:45Z",

Because it's exactly the same video but there is one json for each chapter. The other date you see is when you created the json:

"CreatedAt": "2024-09-16T23:21:16.9694971-04:00",
"CreatedAt": "2024-09-18T00:57:02.0421814-04:00",

@superbonaci
Copy link
Contributor

It created one json for each chapter despite having the chapters below you can see they are repeated exactly in both cases so that is redundant:

    "chapters": [
      {
        "id": "05f0636da035b42ca2411314878bde67",
        "startMilliseconds": 0,
        "lengthMilliseconds": 80000,
        "type": "GAME_CHANGE",
        "description": "Art",
        "subDescription": "",
        "thumbnailUrl": "",
        "gameId": "509660",
        "gameDisplayName": "Art",
        "gameBoxArtUrl": "https://static-cdn.jtvnw.net/ttv-boxart/509660-40x53.jpg"
      },
      {
        "id": "1405752318707a2540edb9e88a1a54f5",
        "startMilliseconds": 80000,
        "lengthMilliseconds": 12886000,
        "type": "GAME_CHANGE",
        "description": "Art",
        "subDescription": "",
        "thumbnailUrl": "",
        "gameId": "509660",
        "gameDisplayName": "Art",
        "gameBoxArtUrl": "https://static-cdn.jtvnw.net/ttv-boxart/509660-40x53.jpg"
      }

@superbonaci
Copy link
Contributor

If you want to use trackers to do very accurate and in depth things forget about it, they are not as reliable as if you can catch the streams when are live on twitch or in the vods section.

They are just to double check in most cases it's ok.

@superbonaci
Copy link
Contributor

The start and end times are both the same in both json, it's not calculating the corresponding times for each chapter:

    "start": 0,
    "end": 12966,

The program you are using to create those json does not work well. If it's TwitchVault it hasn't been updated since Oct 19, 2023.

@EthanZeigler
Copy link
Author

EthanZeigler commented Dec 11, 2024

@superbonaci the program that's making those json files... is twitchdownloader. Twitchvault is a wrapper around twitchdownloader that automatically runs the download for unsaved vods, clips, chats, etc. Any other generation related to twitchvault is storing the raw information returned from the twitch graphql api directly in the same exact manner this project does. See https://github.com/cr08/TwitchVault/blob/c5eda2d5e3d4bf6a6f0b9801f322916dfe143b05/videos.py#L150-L246. It just runs twitch downloader via a shell command.

Chapters are only handled within twitch downloader. Vault, for vod and chat downloads (i'm not using clips), doesnt care about them. It's just reading the same graphql api.

public static async Task<GqlVideoChapterResponse> GetOrGenerateVideoChapters(long videoId, VideoInfo videoInfo)

Twitch vault only stores basic metadata from the graphql api. The way im using it means it doesnt need anything more. https://github.com/cr08/TwitchVault/blob/c5eda2d5e3d4bf6a6f0b9801f322916dfe143b05/utils.py#L84
However what information it does download corresponds with what twitch downloader stored as you can see in the examples above.

@EthanZeigler
Copy link
Author

EthanZeigler commented Dec 11, 2024

TLDR this is on twitch and probably not something either project can do anything about. I made this issue less for "downloader is messing up" and more for "be aware this is a thing twitch's api just decides to do"

@superbonaci
Copy link
Contributor

But metadata.txt has the correct information. The only thing that is missing is when there are title change that only uses one of them not sure which one.

@superbonaci
Copy link
Contributor

Looks like a lot of things get mixed up in the json files.

@ScrubN
Copy link
Collaborator

ScrubN commented Dec 12, 2024

The chat json metadata files are prepared almost identically to the ffmpeg metadata files. I can assure you that this is not a problem with how TD handles data from Twitch.

@EthanZeigler
Copy link
Author

I have another example of bizarre behavior now from november 30th, but i need to double check what we're seeing is the same issue before sharing

@superbonaci
Copy link
Contributor

Yes if you are going to share it make sure it's a vod that we can download and check.

@EthanZeigler
Copy link
Author

EthanZeigler commented Jan 3, 2025

redacted name is for search engine reasons

{
    "id": "2340216348",
    "user_id": "436970891",
    "user_name": "<redacted>",
    "title": "doodlin! !DONO !LURK !DISCORD",
    "type": "archive",
    "duration": "2h32m27s",
    "url": "https://www.twitch.tv/videos/2340216348",
    "views": 6,
    "moments": [
        {
            "duration": 84,
            "offset": 0,
            "id": "509660",
            "name": "Art",
            "type": "GAME_CHANGE"
        },
        {
            "duration": 0,
            "offset": 84,
            "id": "2085980140",
            "name": "Lethal Company",
            "type": "GAME_CHANGE"
        }
    ],
    "muted_segments": [],
    "recorded_at": "2024-12-31T00:02:32Z",
    "recorded_at_iso": "20241231 T000232Z"
}
{
    "id": "2340216348",
    "user_id": "436970891",
    "user_name": "<redacted>",
    "title": "LETHAL COMPANY CHRISTMAS !DONO !LURK !DISCORD",
    "type": "archive",
    "duration": "2h32m27s",
    "url": "https://www.twitch.tv/videos/2340216348",
    "views": 1572,
    "moments": [
        {
            "duration": 84,
            "offset": 0,
            "id": "2085980140",
            "name": "Lethal Company",
            "type": "GAME_CHANGE"
        },
        {
            "duration": 9063,
            "offset": 84,
            "id": "2085980140",
            "name": "Lethal Company",
            "type": "GAME_CHANGE"
        }
    ],
    "muted_segments": [],
    "recorded_at": "2024-12-31T00:02:32Z",
    "recorded_at_iso": "20241231 T000232Z"
}

image
The doodlin vod (pokemon from memory) got overwritten with the lethal company one.

@superbonaci
Copy link
Contributor

superbonaci commented Jan 4, 2025

This is the metadata.txt:

;FFMETADATA1
title=LETHAL COMPANY CHRISTMAS !DONO !LURK !DISCORD (2340216348)
artist=redacted
date=2024
genre=Lethal Company
comment=Created at: 2024-12-31 00:02:32Z\
Video id: 2340216348\
Views: 1588
[CHAPTER]
TIMEBASE=1/1000
START=0
END=84000
title=Lethal Company
[CHAPTER]
TIMEBASE=1/1000
START=84000
END=9147000
title=Lethal Company

The only difference I see is that he changed the title of the stream 4 times, the last one is the current one on Twitch and maybe it was changed after stream end:

https://streamscharts.com/channels/fourleafisland/streams/43477083080

doodlin! !DONO !LURK !DISCORD
CRITTER LETHAL COMPANY !DONO !LURK !DISCORD
CROWD CONTROL LETHAL COMPANY !DONO !LURK !DISCORD
LETHAL COMPANY CHRISTMAS !DONO !LURK !DISCORD

You should not save the stream with the title or chapter (game), because can be changed several times during the stream and also after the stream. Why do you bother doing it that way? Save with the video ID or something, do not save by chapters unless you are sure there are no bugs.

Still I don't see what's your issue.

@superbonaci
Copy link
Contributor

I don't know if there can be any bug when title and game don't change at the same time, for example:

0 - start with one title and one game
1h - change title
2h - change game
3h - end stream

The metadata.txt will only have one title and the 2 games (the chapters), but if you try to to be very precise and log everything some bugs may happen. I don't know if the Twitch API can be that accurate, all that information should be available somehow after the stream ends, mabe @ScrubN knows...

@EthanZeigler What is that you want to do exactly, what do you find wrong?

@superbonaci
Copy link
Contributor

It's working fine now, I think you just need to wait until the stream ends and Twitch processes correctly the video. Also if there are any after end edits it should be as well.

{
  "data": {
    "video": {
      "viewCount": 1591,
      "moments": {
        "pageInfo": {
          "hasNextPage": false
        },
        "edges": [
          {
            "node": {
              "details": {
                "game": {
                  "id": "2085980140",
                  "displayName": "Lethal Company",
                  "name": "Lethal Company"
                }
              },
              "positionMilliseconds": 0,
              "durationMilliseconds": 84000,
              "type": "GAME_CHANGE"
            }
          },
          {
            "node": {
              "details": {
                "game": {
                  "id": "2085980140",
                  "displayName": "Lethal Company",
                  "name": "Lethal Company"
                }
              },
              "positionMilliseconds": 84000,
              "durationMilliseconds": 9063000,
              "type": "GAME_CHANGE"
            }
          }
        ]
      }
    }
  },
  "extensions": {
    "durationMilliseconds": 29,
    "requestID": "<redacted>"
  }
}

@EthanZeigler
Copy link
Author

interesting. that gives me a theory on what might be happening then. Is it possible the automation for finding the channel ids is running into a race condition with the vod finalization process? Will experiment and report back in the future.

@superbonaci
Copy link
Contributor

superbonaci commented Jan 4, 2025

You mean the game ID, which is always the same for each game? The channel and the user have their own ID but that's a different thing.

                "game": {
                  "id": "2085980140",
                  "displayName": "Lethal Company",
                  "name": "Lethal Company"
                }
        {
            "duration": 84,
            "offset": 0,
            "id": "509660",
            "name": "Art",
            "type": "GAME_CHANGE"
        },

Edit: looks like the chapters have also their own id which should not duplicate:

    "chapters": [
      {
        "id": "05f0636da035b42ca2411314878bde67",
        "startMilliseconds": 0,
        "lengthMilliseconds": 80000,
        "type": "GAME_CHANGE",
        "description": "Art",
        "subDescription": "",
        "thumbnailUrl": "",
        "gameId": "509660",
        "gameDisplayName": "Art",
        "gameBoxArtUrl": "https://static-cdn.jtvnw.net/ttv-boxart/509660-40x53.jpg"
      },
      {
        "id": "1405752318707a2540edb9e88a1a54f5",
        "startMilliseconds": 80000,
        "lengthMilliseconds": 12886000,
        "type": "GAME_CHANGE",
        "description": "Art",
        "subDescription": "",
        "thumbnailUrl": "",
        "gameId": "509660",
        "gameDisplayName": "Art",
        "gameBoxArtUrl": "https://static-cdn.jtvnw.net/ttv-boxart/509660-40x53.jpg"
      }

Each VOD has its own ID and also each live stream which are not the same as the VOD.

There are so many types of ID that you must be sure what you are looking for, I'm not sure if that's even documented.

@superbonaci
Copy link
Contributor

I think the duration 0:

    "moments": [
        {
            "duration": 84,
            "offset": 0,
            "id": "509660",
            "name": "Art",
            "type": "GAME_CHANGE"
        },
        {
            "duration": 0,
            "offset": 84,
            "id": "2085980140",
            "name": "Lethal Company",
            "type": "GAME_CHANGE"
        }
    ],

Is because you are reading the info while the stream is live, check here:

// When downloading VODs of currently-airing streams, the last chapter lacks a duration

@EthanZeigler
Copy link
Author

EthanZeigler commented Jan 5, 2025

Okay, i've confirmed it. It's a race condition bug within the twitch api.

Between when a stream has ended and the vod is still being finalized, the api returns incorrect information for the previous vod. This leads to the script i'm using identifying the already downloaded vod as different one than previously downloaded since the title and first chapter changes. It redownloads that vod data and stores it under the incorrect name as returned by the api.

For example, if the just finished stream is going to be assigned id 345 and the last recorded and available vod is 123, calls to the twitch api for 123 returns 123's data with the first chapter information of vod 345 overwriting 123's and the title of 123 being overwritten by 345's first chapter title. After the vod is finalized (and presumably if there's a cache it has flushed), 123 begins returning correct data once again.

In a more distilled form,
vod 123, titled "abc", chapter 1 titled "uvw"
vod 345, titled "cde", chapter 1 titled "xyz"

Stream running

vods = api.get_vods()
-> [123]

for vod in vods:
    if vod <timestamp>_<id>_<title> exists:
        print("vod {vod} exists")
        continue
    else:
        print("vod {vod} downloading")
        api.download_vod(vod)
-> "vod 123|abc exists"

Stream ended, vod finalized

vods = api.get_vods()
-> [123, 345]

for vod in vods:
    if vod <timestamp>_<id>_<title> exists:
        print("vod {vod} exists")
        continue
    else:
        print("vod {vod} downloading")
        api.download_vod(vod)
-> "vod 123|abc exists\nvod 345|cde downloading"

Stream ended, vod not finalized (bug)

vods = api.get_vods()
-> [123]

for vod in vods:
    if vod <timestamp>_<id>_<title> exists:
        print("vod {vod} exists")
        continue
    else:
        print("vod {vod} downloading")
        api.download_vod(vod)
-> "vod 123|xyz downloading"

As far as twitch downloader is concerned, idk how one works around this. Without knowing what the vod title should be, there's no way i can think of to verify the returned data is correct.

@superbonaci
Copy link
Contributor

What is the metadata.txt that returns TwitchDownloaderCLI? Isn't that good enough?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants