[
  {
    "start": 0.08,
    "end": 3.92,
    "text": "There's kind of a trend,\nif you wanna call it like"
  },
  {
    "start": 3.94,
    "end": 7.16,
    "text": "this, or definitely something we see"
  },
  {
    "start": 7.56,
    "end": 11.51,
    "text": "become more common over the last one\nor two years, which is"
  },
  {
    "start": 11.51,
    "end": 14.66,
    "text": "not really surprising, but nonetheless,\nsomething I wanna talk"
  },
  {
    "start": 14.72,
    "end": 18.5,
    "text": "about. The trend of APIs of"
  },
  {
    "start": 18.56,
    "end": 22.28,
    "text": "websites or web services becoming more and"
  },
  {
    "start": 22.36,
    "end": 26.1,
    "text": "more locked down or private or more"
  },
  {
    "start": 26.14,
    "end": 28.73,
    "text": "expensive to use,\nwhatever you wanna call it."
  },
  {
    "start": 28.78,
    "end": 32.36,
    "text": "And the most recent example, which\nis the reason why I'm creating this video"
  },
  {
    "start": 32.4,
    "end": 35.66,
    "text": "here, is the Reddit API,"
  },
  {
    "start": 35.74,
    "end": 39.72,
    "text": "because, uh, two days ago,\nthere's been a post by the,"
  },
  {
    "start": 39.74,
    "end": 43.48,
    "text": "the Reddit team in the Reddit development\nforum where they"
  },
  {
    "start": 43.56,
    "end": 47.379,
    "text": "essentially announced\nthat the usage of their API now"
  },
  {
    "start": 47.46,
    "end": 50.54,
    "text": "needs approval. So there is an approval"
  },
  {
    "start": 50.9,
    "end": 54.4,
    "text": "process for using the API for"
  },
  {
    "start": 54.46,
    "end": 58.08,
    "text": "responsible use to support responsible"
  },
  {
    "start": 58.1,
    "end": 61.98,
    "text": "builders, and I'll get back to that\nand what that means, but it's"
  },
  {
    "start": 62.0,
    "end": 65.679,
    "text": "kind of in line,\nif you wanna call it like this, to what"
  },
  {
    "start": 65.74,
    "end": 69.7,
    "text": "Twitter, X, did two years ago already."
  },
  {
    "start": 69.72,
    "end": 72.98,
    "text": "They made their API really"
  },
  {
    "start": 73.34,
    "end": 76.82,
    "text": "expensive to use, at least at scale."
  },
  {
    "start": 76.86,
    "end": 80.74,
    "text": "So if you wanna interact with Twitter,\nwith X, programmatically,"
  },
  {
    "start": 80.8,
    "end": 84.6,
    "text": "if you wanna build yet another social\nmedia scheduling tool and you wanna"
  },
  {
    "start": 84.62,
    "end": 88.42,
    "text": "support X, well, that could get expensive"
  },
  {
    "start": 88.48,
    "end": 92.28,
    "text": "depending on how you build it, uh,\nbecause the, the free"
  },
  {
    "start": 92.4,
    "end": 96.0,
    "text": "usage is quite limited. You can,\nfor example, read"
  },
  {
    "start": 96.34,
    "end": 100.24,
    "text": "100 posts per month and write 500"
  },
  {
    "start": 100.32,
    "end": 103.3,
    "text": "posts per month,\nwhich might be more than enough"
  },
  {
    "start": 104.02,
    "end": 107.22,
    "text": "for your own little tool\nthat you're building for"
  },
  {
    "start": 107.32,
    "end": 111.2,
    "text": "yourself, but if you\nare building a SaaS product"
  },
  {
    "start": 111.24,
    "end": 115.0,
    "text": "on top of the X API,\nthat will not suffice, so you'll have to"
  },
  {
    "start": 115.04,
    "end": 118.9,
    "text": "pay,\nbut not even the basic tier might be"
  },
  {
    "start": 118.94,
    "end": 122.52,
    "text": "case. It might be, it might not be.\nAnd the pro tier might"
  },
  {
    "start": 122.76,
    "end": 126.74,
    "text": "still not be enough. Now chances are,\nit may be enough, but it"
  },
  {
    "start": 126.84,
    "end": 130.538,
    "text": "is also, uh, quite expensive."
  },
  {
    "start": 130.6,
    "end": 134.43,
    "text": "And now for Reddit, there, as I"
  },
  {
    "start": 134.48,
    "end": 138.24,
    "text": "mentioned, i- it's not about paying\nor about, uh, the"
  },
  {
    "start": 138.3,
    "end": 142.08,
    "text": "price they're asking,\nbut it's about an approval"
  },
  {
    "start": 142.12,
    "end": 146.06,
    "text": "process so that not every application can\nstart using their"
  },
  {
    "start": 146.1,
    "end": 150.04,
    "text": "API. And the question of course is,\nwhy are companies"
  },
  {
    "start": 150.18,
    "end": 154.16,
    "text": "doing that? Well, there\nare a couple of reasons and"
  },
  {
    "start": 154.22,
    "end": 157.96,
    "text": "one big important reason. Obviously,\nyou could say, why would they"
  },
  {
    "start": 158.1,
    "end": 161.72,
    "text": "not do it?\nWhy would they give you access to their"
  },
  {
    "start": 161.88,
    "end": 164.82,
    "text": "free? And you could argue, well,\nbecause in the"
  },
  {
    "start": 164.88,
    "end": 168.86,
    "text": "past, before AI, they may have"
  },
  {
    "start": 168.9,
    "end": 172.54,
    "text": "benefited from doing so. Because\nif people can build"
  },
  {
    "start": 172.58,
    "end": 175.94,
    "text": "products on top of, let's say, X,"
  },
  {
    "start": 176.08,
    "end": 179.6,
    "text": "if I can build a social media scheduling"
  },
  {
    "start": 179.68,
    "end": 183.2,
    "text": "application,\nthat might be in Twitter's and X's"
  },
  {
    "start": 183.36,
    "end": 187.09,
    "text": "interest because more posts on X could"
  },
  {
    "start": 187.14,
    "end": 190.46,
    "text": "mean more engagement, uh,\nmore people reading and"
  },
  {
    "start": 190.5,
    "end": 194.46,
    "text": "interacting with those posts, so\nthat might not be too"
  },
  {
    "start": 194.5,
    "end": 197.84,
    "text": "bad. And there is a reason why you can"
  },
  {
    "start": 197.98,
    "end": 201.22,
    "text": "write more than you can read. You"
  },
  {
    "start": 201.32,
    "end": 204.98,
    "text": "could think that you should be able to\nread more than write because"
  },
  {
    "start": 205.0,
    "end": 208.58,
    "text": "writes are more expensive to their\ndatabase, to their infrastructure"
  },
  {
    "start": 208.62,
    "end": 212.36,
    "text": "it's the opposite.\nThey allow you to write more than they"
  },
  {
    "start": 212.4,
    "end": 216.3,
    "text": "read. And just as a side note, X also"
  },
  {
    "start": 216.44,
    "end": 219.62,
    "text": "has a new program which they're testing,\nit's a pilot right"
  },
  {
    "start": 219.68,
    "end": 223.52,
    "text": "now, uh, where they, um, want to give"
  },
  {
    "start": 223.58,
    "end": 226.85,
    "text": "you, uh, pay per use access to their API."
  },
  {
    "start": 226.86,
    "end": 230.31,
    "text": "But it stays the same.\nYou have to pay to use it and it can get"
  },
  {
    "start": 230.32,
    "end": 233.06,
    "text": "expensive. Now, why\nare companies doing that?"
  },
  {
    "start": 233.1,
    "end": 236.54,
    "text": "Well, the big answer, of course, is AI, or"
  },
  {
    "start": 236.9,
    "end": 240.78,
    "text": "specifically, of course, gen AI. Because"
  },
  {
    "start": 240.86,
    "end": 244.5,
    "text": "with the rise of gen AI, it has become"
  },
  {
    "start": 244.58,
    "end": 248.56,
    "text": "clear that all that data\nwhich these companies own, all"
  },
  {
    "start": 248.62,
    "end": 252.04,
    "text": "these Reddit posts, all the posts on X,"
  },
  {
    "start": 252.68,
    "end": 256.22,
    "text": "that is a valuable resource because"
  },
  {
    "start": 256.24,
    "end": 259.42,
    "text": "those gen AI models, of"
  },
  {
    "start": 259.519,
    "end": 263.46,
    "text": "course, need data in their training"
  },
  {
    "start": 263.5,
    "end": 267.15,
    "text": "or for their training process. They,\ndata is the"
  },
  {
    "start": 267.3,
    "end": 271.2,
    "text": "most important thing there because as we\nall know, ChatGPT"
  },
  {
    "start": 271.26,
    "end": 274.96,
    "text": "or the GPT models\nwere trained essentially on the entire"
  },
  {
    "start": 274.969,
    "end": 278.84,
    "text": "data, publicly available data,\nyou could find on the internet."
  },
  {
    "start": 279.0,
    "end": 282.87,
    "text": "Um, and still,\nthese models will need vast amounts of"
  },
  {
    "start": 282.88,
    "end": 286.56,
    "text": "data for their training. Nowadays,\nof course, there is the entire concept or"
  },
  {
    "start": 286.68,
    "end": 290.5,
    "text": "idea of using synthetic data as well as\nreal"
  },
  {
    "start": 290.54,
    "end": 294.28,
    "text": "data for the training process,\nand to my understanding, that"
  },
  {
    "start": 294.66,
    "end": 298.63,
    "text": "seems to work quite well,\nthough we'll see if that maybe still is"
  },
  {
    "start": 298.66,
    "end": 302.64,
    "text": "a problem and there is like a,\na ceiling due to the limited"
  },
  {
    "start": 302.68,
    "end": 306.28,
    "text": "data that's available because the entire\ndata in the internet has already been"
  },
  {
    "start": 306.32,
    "end": 309.74,
    "text": "consumed,\nso now you're just generating more"
  },
  {
    "start": 309.78,
    "end": 313.51,
    "text": "from that knowledge that\nwas gathered from that"
  },
  {
    "start": 313.54,
    "end": 316.06,
    "text": "internet data,\nso there might be a ceiling there."
  },
  {
    "start": 316.08,
    "end": 319.88,
    "text": "It's not entirely clear yet. Um,\nbut anyways,"
  },
  {
    "start": 319.9,
    "end": 323.79,
    "text": "data is super important\nand of course there's still new data being"
  },
  {
    "start": 323.86,
    "end": 327.66,
    "text": "Now more data than ever\nis generated by AI though, to be"
  },
  {
    "start": 327.68,
    "end": 331.59,
    "text": "fair,\nso that is synthetic data in the end,"
  },
  {
    "start": 331.68,
    "end": 335.22,
    "text": "lots of data is being generated, uh,\nincluding some data by humans"
  },
  {
    "start": 335.58,
    "end": 339.46,
    "text": "on X and Reddit every day, and of course,"
  },
  {
    "start": 339.5,
    "end": 343.38,
    "text": "those platforms don't want to give away\nthat data for free"
  },
  {
    "start": 343.39,
    "end": 347.26,
    "text": "anymore.\nThey did in the past because we didn't see"
  },
  {
    "start": 347.32,
    "end": 351.13,
    "text": "coming with, uh,\nlarge language models and, um,"
  },
  {
    "start": 351.16,
    "end": 354.88,
    "text": "now of course they want to protect their\ndata because a site"
  },
  {
    "start": 354.98,
    "end": 358.72,
    "text": "like X of course sits on lots"
  },
  {
    "start": 359.16,
    "end": 362.98,
    "text": "of data, lots of valuable posts,\nat least to some degree,"
  },
  {
    "start": 363.02,
    "end": 366.76,
    "text": "let's be honest.Most of the posts\nare total BS, but at least"
  },
  {
    "start": 366.88,
    "end": 370.74,
    "text": "some decent posts there\nand definitely valuable in the sense"
  },
  {
    "start": 370.8,
    "end": 374.78,
    "text": "of being valuable for training.\nAnd those sites don't"
  },
  {
    "start": 374.8,
    "end": 378.4,
    "text": "wanna give that data away for free\nanymore, which is why"
  },
  {
    "start": 378.42,
    "end": 382.4,
    "text": "they're locking it down. There also\nis a reason why we"
  },
  {
    "start": 382.44,
    "end": 385.21,
    "text": "see more and more web scraping"
  },
  {
    "start": 385.21,
    "end": 389.18,
    "text": "businesses, uh,\ncoming up almost every day because now"
  },
  {
    "start": 389.2,
    "end": 392.24,
    "text": "with large language models, even\nif we ignore the training"
  },
  {
    "start": 392.34,
    "end": 396.16,
    "text": "part, many of the applications\nthat we wanna build"
  },
  {
    "start": 396.22,
    "end": 399.98,
    "text": "with help of large language models\nor on top of large language models"
  },
  {
    "start": 400.1,
    "end": 403.44,
    "text": "will need access to recent data.\nIf you're building a"
  },
  {
    "start": 403.45,
    "end": 407.34,
    "text": "smart chatbot and you're using OpenAI's\nmodels under the"
  },
  {
    "start": 407.36,
    "end": 411.3,
    "text": "hood,\nyou probably wanna pull in some recent"
  },
  {
    "start": 411.38,
    "end": 415.08,
    "text": "chatbot more useful.\nYou wanna add web search, you"
  },
  {
    "start": 415.16,
    "end": 418.68,
    "text": "wanna be able to have your chatbot answer"
  },
  {
    "start": 418.74,
    "end": 422.18,
    "text": "questions related to the most recent posts\non X."
  },
  {
    "start": 422.22,
    "end": 426.11,
    "text": "So you wanna pull\nthat data into your application and"
  },
  {
    "start": 426.16,
    "end": 430.14,
    "text": "then reach your chat history\nand the prompts you sent to"
  },
  {
    "start": 430.18,
    "end": 434.12,
    "text": "the large language model with\nthat data that then hopefully"
  },
  {
    "start": 434.13,
    "end": 437.56,
    "text": "allows the model to generate a better\nanswer, uh, for what the user"
  },
  {
    "start": 437.7,
    "end": 441.54,
    "text": "asked. And that's why these sites are"
  },
  {
    "start": 441.64,
    "end": 445.47,
    "text": "kind of locking down their APIs to make it\nharder to get"
  },
  {
    "start": 445.58,
    "end": 448.29,
    "text": "access to the data because in the past\nthey gave it away for free."
  },
  {
    "start": 448.34,
    "end": 452.24,
    "text": "They don't wanna do that anymore.\nObviously, there still are ways to"
  },
  {
    "start": 452.26,
    "end": 456.02,
    "text": "get that data, as I mentioned.\nThere's a plethora of web"
  },
  {
    "start": 456.22,
    "end": 459.98,
    "text": "crawling companies\nand not all of these companies, uh,"
  },
  {
    "start": 460.0,
    "end": 463.39,
    "text": "respect the fact\nthat certain sites don't want to get"
  },
  {
    "start": 463.54,
    "end": 467.52,
    "text": "crawled.\nNow I did actually a livestream on"
  },
  {
    "start": 467.58,
    "end": 471.55,
    "text": "the topic of building our own web crawler,\num,"
  },
  {
    "start": 471.78,
    "end": 475.42,
    "text": "a while back, and I'll, uh,\nprovide a link to that"
  },
  {
    "start": 475.88,
    "end": 479.57,
    "text": "episode. You can watch the full, uh,\nlivestream episode, uh,"
  },
  {
    "start": 479.66,
    "end": 483.22,
    "text": "below this episode, of course.\nSo you can build your own"
  },
  {
    "start": 483.26,
    "end": 485.8,
    "text": "crawler and I did that with Crawl4AI."
  },
  {
    "start": 485.82,
    "end": 489.68,
    "text": "And essentially what you're building there\nis, um, an"
  },
  {
    "start": 489.78,
    "end": 492.6,
    "text": "application that spins up a browser and"
  },
  {
    "start": 493.04,
    "end": 496.88,
    "text": "simulates being a user\nand visiting a website to then"
  },
  {
    "start": 496.92,
    "end": 500.46,
    "text": "extract that website content,\nto extract the"
  },
  {
    "start": 500.5,
    "end": 504.39,
    "text": "rendered HTML content and so on. Um,\nthat is"
  },
  {
    "start": 504.52,
    "end": 507.76,
    "text": "how you can build\nand use a crawler in the livestream."
  },
  {
    "start": 507.98,
    "end": 511.89,
    "text": "I just, um,\nbuilt it to crawl my own website,"
  },
  {
    "start": 511.94,
    "end": 515.86,
    "text": "that clear. So I, uh,\ndid not start crawling X"
  },
  {
    "start": 515.9,
    "end": 519.64,
    "text": "there because web scraping and crawling is"
  },
  {
    "start": 519.7,
    "end": 523.539,
    "text": "kind of a gray zone. And, uh, there are"
  },
  {
    "start": 523.56,
    "end": 527.22,
    "text": "many sites that clearly state in their\nterms that they do"
  },
  {
    "start": 527.34,
    "end": 531.33,
    "text": "not allow web crawling. So you\nare violating those terms if you"
  },
  {
    "start": 531.46,
    "end": 534.84,
    "text": "do, which is why sites like Firecrawl, for"
  },
  {
    "start": 534.9,
    "end": 538.66,
    "text": "example, won't crawl X links. If I"
  },
  {
    "start": 539.18,
    "end": 543.06,
    "text": "take an X link and I try to scrape"
  },
  {
    "start": 543.1,
    "end": 546.86,
    "text": "that, uh, I'll get an error that this\nis not"
  },
  {
    "start": 546.92,
    "end": 550.9,
    "text": "supported.\nOr actually here it doesn't even start as"
  },
  {
    "start": 550.94,
    "end": 553.0,
    "text": "it seems in the past, I did get an error."
  },
  {
    "start": 553.52,
    "end": 555.71,
    "text": "So there are sites that don't allow that."
  },
  {
    "start": 555.74,
    "end": 559.45,
    "text": "There probably also are sites that do,\nand you can"
  },
  {
    "start": 559.48,
    "end": 562.67,
    "text": "definitely build your own crawler\nthat doesn't give"
  },
  {
    "start": 563.78,
    "end": 567.56,
    "text": "anything about anything\nand extract any content of any"
  },
  {
    "start": 567.66,
    "end": 571.28,
    "text": "site you wanna extract. Now I will say,\nof course, that many"
  },
  {
    "start": 571.36,
    "end": 575.1,
    "text": "sites also try to implement some technical\nhurdles that make it"
  },
  {
    "start": 575.16,
    "end": 578.93,
    "text": "harder to crawl them, but in the end\nif you really want to, you can"
  },
  {
    "start": 578.98,
    "end": 581.21,
    "text": "get around pretty much all of them."
  },
  {
    "start": 581.26,
    "end": 585.14,
    "text": "It might not be legal,\nit might be violating their terms, but"
  },
  {
    "start": 585.18,
    "end": 589.13,
    "text": "it is possible. Because, and\nthat takes us back"
  },
  {
    "start": 589.16,
    "end": 593.0,
    "text": "to the beginning, to the main topic,\nbecause of course all that data"
  },
  {
    "start": 593.2,
    "end": 597.04,
    "text": "is super valuable. However, this does have"
  },
  {
    "start": 597.42,
    "end": 601.08,
    "text": "I believe a real downside or a, an"
  },
  {
    "start": 601.14,
    "end": 604.48,
    "text": "implication that's not great for us as"
  },
  {
    "start": 604.49,
    "end": 608.4,
    "text": "developers. Because I totally get\nthat these sites don't wanna give"
  },
  {
    "start": 608.46,
    "end": 611.83,
    "text": "away access to their data,\nand just as a side note,"
  },
  {
    "start": 611.86,
    "end": 615.84,
    "text": "it's kind of not their data.\nIt's the data of the users using the"
  },
  {
    "start": 615.88,
    "end": 618.22,
    "text": "site, but that's a whole different story."
  },
  {
    "start": 618.24,
    "end": 621.65,
    "text": "But I get that they don't wanna give away\naccess to this data."
  },
  {
    "start": 621.68,
    "end": 625.35,
    "text": "The problem of course is\nthat kind of as a, a,"
  },
  {
    "start": 625.74,
    "end": 629.59,
    "text": "an additional casualty,\nwe as developers are limited"
  },
  {
    "start": 629.62,
    "end": 632.47,
    "text": "in what we can build, uh, with those APIs."
  },
  {
    "start": 632.47,
    "end": 636.34,
    "text": "Sure,\nyou might get approval for the Reddit API"
  },
  {
    "start": 636.38,
    "end": 640.26,
    "text": "something they're happy with.\nBut of course you also might"
  },
  {
    "start": 640.34,
    "end": 644.26,
    "text": "not get that approval. And for X you have"
  },
  {
    "start": 644.4,
    "end": 647.88,
    "text": "to pay quite a bit of money depending on\nwhat you're"
  },
  {
    "start": 647.92,
    "end": 651.5,
    "text": "building,\neven if what you're building has nothing"
  },
  {
    "start": 651.54,
    "end": 655.28,
    "text": "extracting that data\nand using it for model"
  },
  {
    "start": 655.36,
    "end": 658.94,
    "text": "training or anything like that.\nIt limits the"
  },
  {
    "start": 659.02,
    "end": 662.46,
    "text": "amount of useful stuff we can build on top\nof other"
  },
  {
    "start": 662.56,
    "end": 665.98,
    "text": "services and sites. And\nthat of course in turn, um,"
  },
  {
    "start": 666.84,
    "end": 670.58,
    "text": "might also hurt those sites because some\nuseful products from which they might"
  },
  {
    "start": 670.62,
    "end": 673.86,
    "text": "benefit then maybe won't get built."
  },
  {
    "start": 673.88,
    "end": 677.16,
    "text": "But of course I guess that\nis a price they're happy to pay"
  },
  {
    "start": 677.22,
    "end": 681.04,
    "text": "because either they'll get paid by these"
  },
  {
    "start": 681.08,
    "end": 684.75,
    "text": "people that build products on top of them,\nor at least they prevent that"
  },
  {
    "start": 684.78,
    "end": 688.7,
    "text": "data extraction. So yeah, uh, I expect"
  },
  {
    "start": 688.71,
    "end": 692.37,
    "text": "that we will see more sites and"
  },
  {
    "start": 692.48,
    "end": 695.88,
    "text": "services, uh, locking down their APIs."
  },
  {
    "start": 696.02,
    "end": 699.29,
    "text": "I think we'll see more sites becoming\npretty"
  },
  {
    "start": 699.319,
    "end": 702.96,
    "text": "protective about their data,\npretty aggressive"
  },
  {
    "start": 702.97,
    "end": 706.4,
    "text": "against crawlers, which\nis absolutely their right,"
  },
  {
    "start": 706.44,
    "end": 710.31,
    "text": "there. Um, but I think\nthat might also hurt"
  },
  {
    "start": 710.5,
    "end": 714.319,
    "text": "us as developer because it kind of limits\nthe, the stuff we"
  },
  {
    "start": 714.36,
    "end": 717.54,
    "text": "can build around other popular services\nand"
  },
  {
    "start": 717.64,
    "end": 721.22,
    "text": "sites. These are my two cents.\nWhat do you think about this"
  },
  {
    "start": 721.3,
    "end": 722.1,
    "text": "topic?"
  }
]