CMU Database Group
CMU Database Group
  • 459
  • 3 640 927
S2024 #22 - Amazon Redshift Data Warehouse System (CMU Advanced Database Systems)
Andy Pavlo (www.cs.cmu.edu/~pavlo/)
Slides: 15721.courses.cs.cmu.edu/spring2024/slides/22-redshift.pdf
Notes: 15721.courses.cs.cmu.edu/spring2024/notes/22-redshift.pdf
15-721 Advanced Database Systems (Spring 2024)
Carnegie Mellon University
15721.courses.cs.cmu.edu/spring2024/
Переглядів: 2 894

Відео

S2024 #21 - Yellowbrick Data Warehouse System (CMU Advanced Database Systems)
Переглядів 1,9 тис.Місяць тому
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/21-yellowbrick.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/21-yellowbrick.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #20 - DuckDB Embedded Database System (CMU Advanced Database Systems)
Переглядів 4,6 тис.2 місяці тому
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/20-duckdb.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/20-duckdb.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #19 - Snowflake Data Warehouse Internals (CMU Advanced Database Systems)
Переглядів 4,1 тис.2 місяці тому
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/19-snowflake.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/19-snowflake.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #18 - Databricks Photon / Spark SQL (CMU Advanced Database Systems)
Переглядів 2,8 тис.2 місяці тому
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/18-databricks.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/18-databricks.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #17 - Google BigQuery / Dremel (CMU Advanced Database Systems)
Переглядів 2,6 тис.2 місяці тому
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/17-bigquery.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/17-bigquery.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #15 - Query Optimizer Implementation 3 (CMU Advanced Database Systems)
Переглядів 1,2 тис.2 місяці тому
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/15-optimizer3.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/15-optimizer3.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #14 - Query Optimizer Implementation 2 (CMU Advanced Database Systems)
Переглядів 1,3 тис.2 місяці тому
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/14-optimizer2.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/14-optimizer2.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #13 - Query Optimizer Implementation 1 (CMU Advanced Database Systems)
Переглядів 2,1 тис.2 місяці тому
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/13-optimizer1.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/13-optimizer1.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #12 - Database Networking Protocols (CMU Advanced Database Systems)
Переглядів 2 тис.3 місяці тому
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/12-networking.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/12-networking.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #11 - User-Defined Function Optimizations (CMU Advanced Database Systems)
Переглядів 1,3 тис.3 місяці тому
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/11-udfs.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/11-udfs.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #10 - Multi-Way Join Algorithms / Worst-Case Optimal Joins (CMU Advanced Database Systems)
Переглядів 1,6 тис.3 місяці тому
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/10-multiwayjoins.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/10-multiwayjoins.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #09 - Parallel Hash Join Algorithms (CMU Advanced Database Systems)
Переглядів 2 тис.3 місяці тому
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/09-hashjoins.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/09-hashjoins.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #08 - Query Scheduling & Coordination (CMU Advanced Database Systems)
Переглядів 1,8 тис.3 місяці тому
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/08-scheduling.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/08-scheduling.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #07 - JIT Query Compilation & Code Generation (CMU Advanced Database Systems)
Переглядів 2,2 тис.4 місяці тому
Andy Pavlo (www.cs.cmu.edu/~pavlo/) Slides: 15721.courses.cs.cmu.edu/spring2024/slides/07-compilation.pdf Notes: 15721.courses.cs.cmu.edu/spring2024/notes/07-compilation.pdf 15-721 Advanced Database Systems (Spring 2024) Carnegie Mellon University 15721.courses.cs.cmu.edu/spring2024/
S2024 #06 - Vectorized Query Execution Using SIMD (CMU Advanced Database Systems)
Переглядів 2,5 тис.4 місяці тому
S2024 #06 - Vectorized Query Execution Using SIMD (CMU Advanced Database Systems)
S2024 #05 - Query Execution & Processing Part 2 (CMU Advanced Database Systems)
Переглядів 2,3 тис.4 місяці тому
S2024 #05 - Query Execution & Processing Part 2 (CMU Advanced Database Systems)
S2024 #04 - Query Execution & Processing Part 1 (CMU Advanced Database Systems)
Переглядів 3,4 тис.4 місяці тому
S2024 #04 - Query Execution & Processing Part 1 (CMU Advanced Database Systems)
S2024 #03 - Data Formats & Encoding Part 2 (CMU Advanced Database Systems)
Переглядів 3,6 тис.4 місяці тому
S2024 #03 - Data Formats & Encoding Part 2 (CMU Advanced Database Systems)
S2024 #02 - Data Formats & Encoding Part 1 (CMU Advanced Database Systems)
Переглядів 6 тис.4 місяці тому
S2024 #02 - Data Formats & Encoding Part 1 (CMU Advanced Database Systems)
S2024 #01 - Modern OLAP Database Systems (CMU Advanced Database Systems)
Переглядів 10 тис.4 місяці тому
S2024 #01 - Modern OLAP Database Systems (CMU Advanced Database Systems)
S2024 #00 - Course Overview & Logistics (CMU Advanced Database Systems)
Переглядів 10 тис.5 місяців тому
S2024 #00 - Course Overview & Logistics (CMU Advanced Database Systems)
F2023 #25 - Potpourri: Redis, CockroachDB, Snowflake, MangoDB, TabDB (CMU Intro to Database Systems)
Переглядів 5 тис.6 місяців тому
F2023 #25 - Potpourri: Redis, CockroachDB, Snowflake, MangoDB, TabDB (CMU Intro to Database Systems)
F2023 #24 - SingleStore Database Overview (CMU Intro to Database Systems)
Переглядів 2,7 тис.6 місяців тому
F2023 #24 - SingleStore Database Overview (CMU Intro to Database Systems)
F2023 #23 - Distributed Data Warehouse OLAP Databases (CMU Intro to Database Systems)
Переглядів 4,1 тис.6 місяців тому
F2023 #23 - Distributed Data Warehouse OLAP Databases (CMU Intro to Database Systems)
Chroma Vector Database: Retrieval for LLMs (Hammad Bashir + Liquan Pei)
Переглядів 2,6 тис.6 місяців тому
Chroma Vector Database: Retrieval for LLMs (Hammad Bashir Liquan Pei)
F2023 #22 - Distributed Transaction Processing Databases (CMU Intro to Database Systems)
Переглядів 3,4 тис.6 місяців тому
F2023 #22 - Distributed Transaction Processing Databases (CMU Intro to Database Systems)
pgvector: Stylish Hierarchical Navigable Small World Indexes (Jonathan Katz)
Переглядів 3,2 тис.6 місяців тому
pgvector: Stylish Hierarchical Navigable Small World Indexes (Jonathan Katz)
F2023 #21 - Intro to Distributed Databases (CMU Intro to Database Systems)
Переглядів 6 тис.6 місяців тому
F2023 #21 - Intro to Distributed Databases (CMU Intro to Database Systems)
F2023 #20 - Database Recovery (CMU Intro to Database Systems)
Переглядів 2,7 тис.7 місяців тому
F2023 #20 - Database Recovery (CMU Intro to Database Systems)

КОМЕНТАРІ

  • @digitulized459
    @digitulized459 19 годин тому

    Either that blockchain guy is a troll or that was one of the most entitled douches I've ever seen at a lecture.

  • @m.imranzaheer1368
    @m.imranzaheer1368 22 години тому

    superb bro. Loved ur lecture

  • @chenqiang19860101
    @chenqiang19860101 День тому

    For the log structure, if we still need an index for look up, how to save the index? How updating that index does not end up in random io stuff?

  • @aliasonline1493
    @aliasonline1493 2 дні тому

    really well explained! thank you!

  • @akashkulkarni832
    @akashkulkarni832 3 дні тому

    what is the outro song??

  • @njgarg
    @njgarg 4 дні тому

    Why is the "lost updates" anomaly missing in the discussion of isolation levels?

  • @tylerrongione6696
    @tylerrongione6696 6 днів тому

    this is f*cking awesome

  • @njgarg
    @njgarg 7 днів тому

    Great lecture.. but for this specific lecture, camera is moving too much and also quality is not HD.

  • @user-wh3ql8lu9v
    @user-wh3ql8lu9v 9 днів тому

    Thank you for the lecture. Can you please speak a little slow next time or add subtitles? Microphone doesn't do justice in picking up your enunciation of certain words.

  • @indavarapuaneesh2871
    @indavarapuaneesh2871 9 днів тому

    insightful lecture

  • @jauhararifin10
    @jauhararifin10 11 днів тому

    In 1:20:32, Oracle/MySQL and Postgres don't use memory as the primary storage, do they? And with that, Oracle/MySQL still beat most in-meomry DBMS? Is it because their WAL was disabled for this benchmark?

  • @oz5219
    @oz5219 11 днів тому

    I have a question with regard to the AllocatePage method in bustub when doing projec 1(can't paste link because it keep getting deleted). The way it try to retrieve a new page id is just return next_page_id_++ with next_page_id_ initialized to 0? But should it not consider what other page ids already used in disk? I look around in the codebase but did not find any code try to keep track of the page ids used in disk. Please let me know if I miss anything, thanks a lot

    • @oz5219
      @oz5219 11 днів тому

      an follow up question is that, judging from the ReadPage method, the page id will basically the physical index on disk. Eg if page id is 10, then the method will try to seek the 10th block of the page file. Is this a industry practice? In my mind it seems to be more reasonable if we have a metadata of page id to their physical offset

    • @andypavlo
      @andypavlo 11 днів тому

      15445.courses.cs.cmu.edu/fall2023/faq.html#q8

  • @armsofundertow98
    @armsofundertow98 18 днів тому

    This is how you do it in Oracle btw SELECT TO_DATE ('2023-08-30', 'YYYY-MM-DD') - TO_DATE ('2023-01-01', 'YYYY-MM-DD') AS days FROM DUAL;

  • @user-lv2ht3qv2l
    @user-lv2ht3qv2l 19 днів тому

    thanks a lot!

  • @badrphone2393
    @badrphone2393 22 дні тому

    isn't sql would do a linear scan when searching for a tuple ?

  • @ishanrawat5308
    @ishanrawat5308 29 днів тому

    The lectures are undoubtedly brilliant. But, I really really love the intro and the outro too. What a production! So cool.

  • @bavideomaker29
    @bavideomaker29 Місяць тому

    What about the pure MOLAP’s and financial modeling databases like Essbase, Hyperion, TM1. Plus Cognos PowerPlay, SQL Server Analysis Services?

  • @user-vo2bt7ex5c
    @user-vo2bt7ex5c Місяць тому

    Just clarification on Sybase IQ - this was/is a fully fledged columnar store, not just an in memory accelerator. Sybase did build a product like that, called RAP (Realtime Analytics Platform) where they used their row store in memory, and IQ as a backing column store for analytics.

  • @user-lv2ht3qv2l
    @user-lv2ht3qv2l Місяць тому

    thanks a lot.

  • @prakhargupta1224
    @prakhargupta1224 Місяць тому

    8:41

  • @mannana8550
    @mannana8550 Місяць тому

    kraska is not a good guy

  • @user-mg8yq4xp1v
    @user-mg8yq4xp1v Місяць тому

    Is this course any different from the one from 2018?

  • @theghost9362
    @theghost9362 Місяць тому

    just started the course , and I can tell , it's gonna be thick haha

  • @syphiliticpangloss
    @syphiliticpangloss Місяць тому

    Missing is an understanding of the optimizer and how to debug things. When things go wrong, when predicates are not pushed down. You ask why? And you have nothing.

  • @amir-ali8850
    @amir-ali8850 Місяць тому

    audio really is bothering me but i have to learn

  • @user-lv2ht3qv2l
    @user-lv2ht3qv2l Місяць тому

    thanks a lot!

  • @mullinsms
    @mullinsms Місяць тому

    I'm surprised that Netezza didn't come in for a mention for the mid-2000 based Postgres based OLAP systems, it's very relevant because the AQUA FPGA concept was already in user by the Netezza appliances back the mid-2000s. AQUA had super poor adoption and they never announced that it went away.

  • @MohamedFouad-vt2lm
    @MohamedFouad-vt2lm Місяць тому

    thanks

  • @padam_discussion
    @padam_discussion Місяць тому

    nice explanation of Redhsift. I could not understand Primitive supported by Redshift only not by other databases. when I search i do see that Snowflake support all 8 primitive but not supported array something. if you can help to explain more on this please. thanks

  • @rangelspasov
    @rangelspasov Місяць тому

    DPDK sounds like an in-house version of QUIC for reliable data transmission over UDP en.wikipedia.org/wiki/QUIC

  • @fg746
    @fg746 Місяць тому

    Nice to see you have moved bloom filters from join algorithms section last year to general hash tables lecture, cool staff

  • @user-lv2ht3qv2l
    @user-lv2ht3qv2l Місяць тому

    thanks a lot. really appreciated.

  • @asadawadia
    @asadawadia Місяць тому

    needs more details on how group by is done in volcano/iterator model

  • @en1766
    @en1766 Місяць тому

    @31:08 correction: dataframes came from S, then R, and then pandas. Wes talks about how he stole them for Ibis/Arrow/pandas in another video I'm forgetting. Sincerely, - R user

    • @bigstone3099
      @bigstone3099 Місяць тому

      Hadley Wickham was the inspiration for us all on Data-science, starting first on R as the statistics world was mostly there. Wes just allowed Python to catch-up, or in a sense, popularize the word of Hadley God into a more all-purpose langage.

  • @wangpixu234
    @wangpixu234 Місяць тому

    wow the blockchain idiot!! 😮 I really can't believe there are people like these in CMU😵‍💫

  • @user-lv2ht3qv2l
    @user-lv2ht3qv2l Місяць тому

    thanks a lot.

  • @SteveLoughran
    @SteveLoughran Місяць тому

    Another little detail: spark can delegate saving of shuffle data to the Hadoop Yarn NodeManager process -which can serve data even after the spark worker process terminates. This allows for more agile spark clusters within a Hadoop cluster. However, with the move to kubernetes container hosting spark serves the data itself and assumes that it won’t terminate. This is potentially a problem with deployment on spot-priced cloud VMs as the “your server is fairly reliable” no longer holds.…

  • @NickEllis-nr6ot
    @NickEllis-nr6ot Місяць тому

    Excellent explanations/lecture Prof Andy!

  • @vicsteiner
    @vicsteiner Місяць тому

    Just don't use NULL, there can't be a data type for an absence of data (even more contradictory the absurd of a phrase like "null value") . The relational data model itself does not have NULL, it is based on 2 valued logic, not on 3 valued logic. I do think this is an awesome course but in general it does not properly observe the theoretical background for the relational data model.

  • @youssefackraman6194
    @youssefackraman6194 Місяць тому

    Thanks dr.Andy,I wish for you eternal happiness

  • @TobiasFrei
    @TobiasFrei Місяць тому

    To my knowledge IDS has been used in industry for a long time. See Bull's GCOS with IDS/II.

  • @cexploreful
    @cexploreful Місяць тому

    GREAT PRESENTATION! VERY CLEAR! 🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠

  • @Brandon-youtube
    @Brandon-youtube 2 місяці тому

    I don’t work with Snowflake in my current role but I think about the white paper frequently. “Cloud native” is thrown around so loosely but it wasn’t until that white paper where I had seen something that was truly cloud native from start to finish and was only possibly with the scale and flexibility of modern cloud providers.

  • @cexploreful
    @cexploreful 2 місяці тому

    Impressive the throughput of content Justin inserts into his talk! - very smooth, quite impressive! - Happy to have learn about BigQuery

  • @filmfranz
    @filmfranz 2 місяці тому

    Second, distributed, here to stay

  • @gigiduru125
    @gigiduru125 2 місяці тому

    does anyone know how they handle security in JIT systems like hyper/umbra? like how can they stop someone from getting the compiler to generate code that reads or writes the system's memory?

  • @hubstrangers3450
    @hubstrangers3450 2 місяці тому

    These lectures should've followed HDFS....2011...instead of 2024.....with Compute/Security/Cost/etc, "Cloud should gradually depreciate", doesn't need to join the "Taxing Community..."

  • @prateek1317
    @prateek1317 2 місяці тому

    What is the worst idea?

  • @brucem8448
    @brucem8448 2 місяці тому

    Here first, fastest. I must be vectorized.

  • @agarbanzo360
    @agarbanzo360 2 місяці тому

    Here first, fastest. I must be vectorized.