GitHub - bicknell/split-sum: Splitting a list of numbers into two parts that total the same in 20 languages.

bicknell / split-sum Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Splitting a list of numbers into two parts that total the same in 20 languages.

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
src		src
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README		README
benchmark-run.txt		benchmark-run.txt
split-sum-create.sql		split-sum-create.sql
split-sum-query.sql		split-sum-query.sql
split-sum.c		split-sum.c
split-sum.cpp		split-sum.cpp
split-sum.f90		split-sum.f90
split-sum.go		split-sum.go
split-sum.js		split-sum.js
split-sum.lsp		split-sum.lsp
split-sum.m		split-sum.m
split-sum.php		split-sum.php
split-sum.pl		split-sum.pl
split-sum.py		split-sum.py
split-sum.r		split-sum.r
split-sum.rb		split-sum.rb
split-sum.sh		split-sum.sh
split-sum.swift		split-sum.swift
split-sum.tcl		split-sum.tcl
split-sum.zsh		split-sum.zsh
splitSum.java		splitSum.java
split_sum.erl		split_sum.erl

Repository files navigation

History:

    The author had an interview with a coding component.  This was
    one of the problems.  The author had a choice of languages, and
    began to ponder which was best.

    At the same time, the author was curious what ChatGPT-4 could
    do in terms of generating code for other languages.  The author
    generated the first code by hand, and used ChatGPT to do initial
    translations to other languages.  About 1/3 of the ChatGPT
    versions worked as-is, while 2/3rds took some minor fixing and
    debugging.  All ChatGPT versions were looked over manually for
    optimizations.  In at least two cases it chose very inefficient
    methods that while working greatly slowed execution, these were
    manually fixed.

    The author is not fluent in all of these languages.  The author
    accepts patches that improve the programs using standard methods
    (e.g. no "special tricks").

Problem statement: 

    Given a list of integers determine if the list can be split at
    any point so that the sum of the items on both sides of the
    split are the same.  Return the left and right parts as new
    lists.

Test cases:

    []  ->  [[], []]
    [100]  ->  [[], []]
    [99, 99]  ->  [[99], [99]]
    [98, 1, 99]  ->  [[98, 1], [99]]
    [99, 1, 98]  ->  [[99], [1, 98]]
    [1, 2, 3, 0]  ->  [[1, 2], [3, 0]]
    [1, 2, 3, 5]  ->  [[], []]
    [1, 2, 2, 1, 0]  ->  [[1, 2], [2, 1, 0]]
    [10, 11, 12, 16, 17]  ->  [[10, 11, 12], [16, 17]]
    [1, 1, 1, 1, 1, 1, 6]  ->  [[1, 1, 1, 1, 1, 1], [6]]
    [6, 1, 1, 1, 1, 1, 1]  ->  [[6], [1, 1, 1, 1, 1, 1]]
    
    Note: The return [[], []] may be null/nill/error in some languages.

Software Needed:
  
    On an OSX Sonoma (14.1 or greater) system:

        - Pre-installed: Python 3, Perl 5, Ruby, TCL, ZSH, Bash
        - Install Xcode for: C, C++, Objective-C, Swift
        - Install Rust with rustup:
            - curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
        - Install Node.JS using installer at https://nodejs.org/en/download
        - Install Go using package at https://go.dev/doc/install
        - Install Java JDK using DMG at https://www.oracle.com/java/technologies/downloads/
        - Use HomeBrew at https://brew.sh for:
            - brew install clisp
            - brew install php
            - brew install postgressql
            - brew install r
            - brew install erlang

    Note, PostgreSQL must be started and stopped as needed:
        - brew services start postgresql@14
        - brew services stop postgresql@14

How to run:

    'make' will execute all compilation steps (same as 'make all').

    'make runall' will build (if necessary) and then execute tests in all languages.

Language specific notes:

    ZSH and BASH do not support functions returning values or pass
    by reference.  The values are printed to the screen and captured
    which is standard practice.

    Java requires the file name and class name to match, and the
    class name to not include a '-' character, resulting in the
    file being called splitSum.java.

    Rust needs to be build with cargo to bring in the lazy_static
    dependency.  Cargo requires a particular directory structure,
    so the rust source code is in src/main.rs, the executable is
    output to target/release/split-sum and the Cargo.toml file is
    in the main directory.

    C++, Objective-C, and Swift can all basically use the same
    implementation as C, but instead their OOP data structures
    were used to demonstrate the differences in performance.

    TCL global variables must be before any procedures, so the
    cases variable has been moved to the top of the file.

    ZSH and BASH are both slow enough that 100,000 runs takes too
    long.  The code has been modified to do 10,000 runs, and
    multiply the result by 10.

Arrays:

    Indexing: 

        Fortran R, Erlang, and ZSH both start arrays at 1 and index
        from 1 to <length>.  All other languages start at 0 and index
        from 0 to <length - 1>.

        In Fortran is it possible to start arrays at any index, a
        feature not used in this code.

        Fortran arrays are also "column major" unlike other
        languages.  The 'reshape' function is used to initialize
        cases so that the text in the code continues to be row-major
        like other languages.

    Slices

        C has no slice function, elements are copied with memcpy.

        Erlang no slice function, but it does have a list:split().

        Go, Rust, Python 3, PHP, C++, Objective-C, LISP, Java and
        JavaScript all return a list exclusive of the end position.
        E.g.  Slicing [ 1 2 3 4 5 ] with a start of 1 and an end
        of 3 results in [ 2 3 ].

        TCL, Ruby, Swift, Perl 5, R, PostgreSQL, BASH, all return
        a list inclusive of the end position.  E.g. Slicing 
        [ 1 2 3 4 5 ] with a start of 1 and an end of 3 results 
        in [ 2 3 4 ].

        Fortran, R, and ZSH all return a list inclusive of the end
        positio, but also start arrays at 1!!!!  E.g. Slicing 
        [ 1 2 3 4 5 ] with a start of 1 and an end of 3 results in
        [ 1 2 3 ].


LOC:

    Lines of Code is not a great measure of any programming language.
    It is also a measure that can be easily manipulated in many
    languages, for instance by using naked if statements insted of
    enclosing in braces.  Each of these attempted to write the code
    in the style most common in that language, more or less.

    Lines Filename              Lanugage
       58 split-sum.py          Python 3
       59 split-sum.lsp         LISP
       67 split_sum.erl         Erlang
       68 split-sum-create.sql  SQL (Two files together.)
          split-sum-query.sql
       69 split-sum.js          JavaScript
       70 split-sum.pl          Perl 5
       72 split-sum.rb          Ruby
       73 split-sum.php         PHP
       78 split-sum.tcl         TCL
       81 split-sum.swift       Swift
       82 split-sum.go          Go
       82 splitSum.java         Java
       83 split-sum.zsh         ZSH
       86 split-sum.sh          SH
       88 src/main.rs           Rust
       93 split-sum.m           Objective-C
      100 split-sum.cpp         C++
      111 split-sum.c           C
      136 split-sum.f90         Fortran

Results:

    The runall target runs 100,000 (10,000 for BASH and ZSH, which is
    then multiplied by 10 for consistency) iterations of the testCases
    function and outputs the time in seconds taken for all runs.

                               Times  Array Slice x:y returns
    Rank   Seconds Language    Slower Start entries from          Comment
    1       0.072  C             1.0  0     No slice function.    Built in arrays of int.
    2       0.114  Fortran       1.5  1     [x]   ... [y]         Built in arrays of integer.
    3       0.318  Java          4.4  0     [x]   ... [y-1]       Built in arrays, of int.
    4       0.415  Go            5.7  0     [x]   ... [y-1]       Built in arrays i32.
    5       0.445  Rust          6.1  0     [x]   ... [y-1]       Box of Box of i32.
    6       0.605  JavaScript    8.4  0     [x]   ... [y-1]       Built in arrays, of int.
    7       1.538  Erlang       21.3  1     list:split()          Built in lists, of int.
    8       1.917  TCL          26.6  0     [x]   ... [y]         Built in arrays untyped.
    9       1.952  C++          27.1  0     [x]   ... [y-1]       vector of vector of int.
    10      2.643  PHP          36.7  0     [x]   ... [y-1]       Built in arrays, untyped.
    11      2.910  Swift        40.4  0     [x]   ... [y]         Built in arrays of int.
    12      4.290  Objective-C  59.5  0     [x]   ... [y-1]       NSArray of NSArray of NSNumber.
    13      4.907  Ruby         68.1  0     [x]   ... [y]         Built in arrays, untyped.
    14      7.361  Python 3    102.2  0     [x]   ... [y-1]       Built in arrays, of int.
    15     17.650  Perl 5      245.1  0     [x]   ... [y]         Built in arrays, untyped.
    16     21.847  R           303.4  1     [x]   ... [y]         Built in lists, untyped.
    17     33.493  PostgreSQL  465.1  0     [x]   ... [y]         Built in arrays, of int.
    18     46.861  LISP        650.8  0     [x]   ... [y-1]       Built in lists, untyped.
    19    860.000  ZSH         11944  1     [x]   ... [y]         Built in arrays, untyped.
    20   2350.000  BASH        32638  0     [x]   ... [y]         Built in arrays, untyped.

    The file benchmark-run.txt contains the run used to make this table.

Bugs/Known Limitations:

    The code here is more for language comparison than to be fully optimized and
    hardended.  As a result the following shortcomings are known.
 
    C, Go, Rust, C++, and Swift all use "integer" arrays, typically 32 bit integers
    on most platforms.  The total values in the splitSum function are also the same
    data type, leading to an overflow possibility.

    C, C++, Objective-C and Swift all pass by reference (pointers) to the splitSum
    function, but do not check if these pointers are NULL/NIL before using them.

    In all languages it is assumed the dataset can fit into memory.

    It would be nice if the test cases were in a file and all programs read the same
    input.