-
-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform minimal type checking for dictSet function, #5456
base: main
Are you sure you want to change the base?
Conversation
…/value pair to an existing map, instead of creating a new map from scratch
I like the idea behind this change! I think we might be able to simplify it a bit while keeping the performance improvements you introduced.
What do you think? (Stachu will be doing the actual review since he knows this best) |
I think that's a good idea! I wasn't sure of how to clean the code up, but I like your suggestion. It gave the following idea for implementation: if Map.isEmpty map then
// create from [(key, value)] list as existing commit does
else
let (oldKey, oldVal) = Map.minKeyValue map
// use oldKey and oldVal same way existing commit does next I think that would simplify it by getting rid of the pattern matching, and we would only need one line to find the (key, value) pair instead of making two separate queries to the map as the commit currently does. If Stachu is fine with the general idea of this commit, I would be happy to update the commit to incorporate that suggestion. |
I think this solution puts a lot of responsibility on the specific builtin function. Rather, I'd love to see a solution that could be easily re-used, with the majority of updates likely in Right now, the let dict
(threadID : ThreadID)
(typ : ValueType)
(entries : List<string * Dval>)
: Dval = I think we could add another function let dictAddEntry // or ...dictAddEntries
(threadID : ThreadID)
(typ : ValueType)
(entries : DvalMap)
(newEntry: string * Dval)
: Dval = , refactoring the relevant functionality out of If this solution works well, a similar solution will likely be really useful for adding entries to What do you think? Generally speaking, we'd like:
|
…, as long as it is not the minimum key in the dictionary; this conflicts with previous semantics of Dict.set)
…e (I changed it to True momentarily as the build command mentioned skipping tests, and I wanted to go through the tests)
I think this is a good idea and the latest commit follows it. (I have some tolerance to messy code 😅 but don't want to make things harder for others to maintain, so I appreciate the advice from you and Ocean on how to simplify the code further.) There's a regression which I've added a test for and it's easily resolved, but I'm not sure what the maintainers' preferences are. The first of these tests passes fine. The second one would have before, but now the second one doesn't. We're only checking the minimum key/value pair instead of every element which is why the first passes while the second doesn't. Stdlib.Dict.set_v0 (Dict { key1 = "val1before" }) "key1" "val1after" =
Builtin.testDerrorMessage "Cannot add two dictionary entries with the same key `key1`"
Stdlib.Dict.set_v0 (Dict { key1 = "val1"; key2 = "val2" }) "key2" "newVal2" =
Builtin.testDerrorMessage "Cannot add two dictionary entries with the same key `key2`" The fix is easy: we can move up (or duplicate) the call to // exception-raising code in TypeChecker.DvalCreator.dict function
if Map.containsKey k entries then
RTE.Dicts.Error.TriedToAddKeyAfterAlreadyPresent k
|> RTE.Error.Dict
|> raiseRTE threadID I would personally prefer to put the if statement in the builtin (maybe put the exception-raising code in a new function so it doesn't need to be copy-pasted repeatedly). Some functions disallow "reinsertion of the same key" while others like That's just my preference though. I'd appreciate a second opinion on where to put the duplicate-check since Chesterton's fence is a thing and I'm not as familiar with the code. Edit: I accidentally clicked "close with comment" instead of the other button but hopefully it's not closed now. |
Hi there, thanks for continuing with this. some thoughts:
let dictAddEntry // or ...dictAddEntries
(threadID : ThreadID)
(typ : ValueType)
(entries : DvalMap)
(newEntry: string * Dval)
(shouldMergeUponDupeKey: bool) // new
: Dval =
let (k, v) = newEntry
if Map.containsKey k entries && shouldMergeUponDupeKey then
RTE.Dicts.Error.TriedToAddKeyAfterAlreadyPresent k
|> RTE.Error.Dict
|> raiseRTE threadID
let vt = Dval.toValueType v
match VT.merge typ vt with
| Ok merged -> DDict(merged, Map.add k v entries)
| Error() ->
RTE.Dicts.Error.TriedToAddMismatchedData(k, typ, vt, v)
|> RTE.Error.Dict
|> raiseRTE threadID) am I missing some reason why that won't work? |
That looks like it would be perfect actually, and cleaner than the way I previously thought of distinguishing duplicates. I think that kind of approach didn't come to mind before because of a misunderstanding I had of your original comment. I'll try implementing your suggestions tomorrow and seeing if there's anything to improve. (I think the only thing that sticks out to me is the new bool parameter when the Darklang coding guidelines, if I remember, prefer two-case discriminated unions.) |
a 2-case DU would be fine. I was mostly lazy to come up with names :) |
… when new key already exists in Dict, or whether to raise an exception instead. 2. Add another builtin: Dict.setOverridingDuplicates 3. Make stdlib function "setOverridingDuplicates" in file "packages/darklang/stdlib/dict.dark" use new builtin (it was previously implemented by removing the key from the dict and then setting the key/value pair, which is semantically equivalent) 4. Add a couple of additional tests for "Dict.setOverridingDuplicates", and amend regression (introduced in previous commit) in test for "Dict.set" when setting a key/value pair that (is not the minimum key in the set && key already exists in set)
I didn't notice or understand before, but it looks like there was a I only noticed after implementing the new I did also learn that the project uses different formatting settings for fantomas, which is nice to know (so I shouldn't just invoke I'm not entirely sure why CI is failing as local tests in the container are running fine, but it looks like I have more to learn about which is kind of a plus. 😅 |
As far as I know, we want to keep most of the functionality written in Dark and move as many built-ins as possible to be written in Dark. But there are sometimes exceptions, so Stachu will follow up here :)
btw, you can run
Sometimes some tests are flaky in CI, I'll rerun the tests and we'll see :) |
What is the problem/goal being addressed?
There's a performance problem with the
dictSet
function: the dictionary is a map, which we convert to a list, only for the dictionary type-checking function to convert it back to a map.Suboptimal performance comes from two things:
dictSet
function.What is the solution to this problem?
I think we can perform fewer operations while upholding type safety.
No need to check the type of every single element, or convert the map to a list, which are both O(n) operations.
What are the changes here? How do they solve the problem and what other product impacts do they cause?
Commit only contains changes to a bit of purely functional code, and all the Dict.set tests which passed before still pass now.
I also added two new tests, adding different key/value pairs to empty dictionaries.
Has this information been included in the comments?
The general idea is commented.
Other notes
I wanted to add a test to check if a type error is raised when we:
but I wasn't sure how to write multi-line tests as there didn't seem to be any in the files I looked at. I think my code won't introduce any regression in that regard because it passes the exact same arguments to the type-checking function, except the list to type-check is shorter.
As an alternative to this approach, if we want to type check N elements but not to covert the whole map to a list, we can call Maq.toSeq and iterate through a number of elements smaller than the length of the map, but I'm not sure what we gain with that approach.