![]() This probably won’t bite you in a key-value / document-store workload, but it’s easy to run into this if you’re using JSONB along with analytical queries. Ordinarily, you should never do something like that. This caused production issues for us, and the only way to get around them was to disable nested loops entirely as a join option, with a global setting of enable_nestloop = off. This is one of the hidden costs of JSONB: your data doesn’t have statistics, so the query planner is flying blind. The result is that innocent queries will blow up on you. In their absence, the planner can’t determine which join algorithms, join orders, or scan types will make your query fast. But since there are ~125,000 such measurements, instead of ~1, this turns out to take an eternity.**Īs always, accurate statistics are a critical ingredient to good database performance. As a result, it chooses a nested loop join: for each row in measurements that passes our filter, look up the corresponding lab_name in scientist_labs via the primary key of the latter table. So, it estimates that 0.1% of 0.1% of 0.1% of the measurements table will be relevant (which it rounds up to ~1 row). It has no way of knowing, for example, that record -> 'value_2' = 0 will be true about 50% of the time, so it relies on a hardcoded estimate of 0.1%. The underlying reason is that PostgreSQL doesn’t know how to keep statistics on the values of fields within JSONB columns. The performance is dramatically worse - a whopping 584 seconds on my laptop, about 2000x slower: The analogous read query would look like this: Let’s say we instead store our measurements as JSONB blobs, like this: The execution is fast - about 300 ms on my machine. That is, load the contents of scientist_labs into a hash table keyed on scientist_id, scan through the matching rows from measurements, and look each one up in the hash table by its scientist_id value. ![]() This is what we’d hope to see: the planner knows from our table statistics that about 1/8th of the rows in measurements will have value_1, value_2, and value_3 equal to 0, so about 125,000 of them will need to be joined with a scientist’s lab, and the database does so via a hash join. Our query will look something like this:Īnd our query plan will look something like this: Let’s say we want to get the tick marks in which all three values were 0 - which should be about 1/8th of them - and see how many times each lab was represented amongst the corresponding scientists. Each measurement was taken by one of 10,000 scientists, and each scientist comes from one of three labs: For example, let’s make a table with 1 million “measurements” of three values, each chosen at uniform random from. Hidden Cost #1: Slow Queries Due To Lack Of Statisticsįor traditional data types, PostgreSQL stores statistics about the distribution of values in each column of each table, such as:įor ordered types, a histogram sketch of the distribution of values in the columnįor a given query, the query planner uses these statistics to estimate which execution plan will be the fastest. There is great material for deciding which of JSON, JSONB, or hstore is right for your project, but the correct choice is often “none of the above.”* Here are a few reasons why. But there are considerable performance costs to doing so, some of which aren’t immediately obvious. The idea of not having to explicitly manage a schema appeals to a lot of people, so it shouldn’t be surprising to see JSONB used this way. ![]() Recently, I’ve gotten a few questions about the benefits and drawbacks of using JSONB to store the entirety of a table – why have anything but an id and a data blob? We lean on JSONB heavily at Heap, and it’s a natural fit, as we have APIs that allow customers to attach arbitrary properties to events we collect. ![]() And it fits nicely in a startup engineering context: just add a properties column to the end of your table for all the other attributes you might want to store down the road, and your schema is now officially Future Proof TM. It meaningfully extends PostgreSQL and makes it a viable choice for a lot of document store workflows. (Well, about as much as you can expect for a new data type in an RDBMS.) It’s a wonderful feature: a format that lets you store blobs in the lingua franca of modern web services, without requiring re-parsing whenever you want to access a field, and in a way that enables indexing for complicated predicates like containment of other JSON blobs. As with the path oriented operators, negative integers that appear in path count from the end of JSON arrays.PostgreSQL introduced the JSONB type in 9.4 with considerable celebration. Returns target with the section designated by path replaced by new_value, or with new_value added if create_missing is true (default is true) and the item designated by path does not exist. Jsonb_set(target jsonb, path text, new_value jsonb ) ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |