Skip to content

Latest commit

 

History

History
692 lines (539 loc) · 26.3 KB

interview.md

File metadata and controls

692 lines (539 loc) · 26.3 KB

Interview Questions

Collective interview questions on data structure and algorithms

Contents


Languages

Error Handling

  • Deal with exceptions in Python

  • Use try/except in Python

  • Catch all exceptions (NOT recommended)

    try:
        do_some_func()
    except BaseException as ex:  # catch *all* exceptions; same as bare `except:`
        # will catch all including `SystemExit`, `KeyboardInterrupt`, or `GeneratorExit`.
        pass  # potential causing to hang
  • Find out the exception name

    try:
        do_some_func()
        raise IndexError # as test error
    except Exception as ex:
        # won't catch `SystemExit`, `KeyboardInterrupt`, or `GeneratorExit`.
        print(type(ex).__name__) # print the name of the exception
  • Print traceback

    import traceback
    try:
        sqrt = 0**-1
    except Exception as ex:
        print(ex)
        print(traceback.print_exc())

Multi-threading

Code Online

Others


Data Structure

Array/List

  • Check if two strings are anagrams of each other.
  • Check if a string or a sentence is palindrome char by char, or word by word.
  • Check if one string contains all characters in another string.
  • Find all permutations of a String.
  • Find the duplicate number in a given integer list.
  • Find the largest and smallest in a given unsorted integer list.
  • Find the missing number in a given unsorted integer list, e.g. from 1~100.
  • Find all pairs, in an integer list, whose sum is equal to a given number.
  • Reverse order of words in a sentence.

Linked List

  • Find the middle element in a singly linked list (in one pass).
  • Check if a linked list contains a cycle. Find the starting node of the cycle.
  • Reverse a linked list.

Queue and Stack

  • How to implement a queue using a stack?
  • Implement a quick_sort function without recursion.

Tree and Trie

Binary Tree

  • A binary tree is a tree data structure in which each parent node can have at most two children.

Full Binary Tree

  • A full Binary tree is a special type of binary tree in which every parent node has either two or no children.

Complete Binary tree

  • Every level must be completely filled
  • All the leaf elements must lean towards the left.
  • The last leaf element might not have a right sibling i.e. a complete binary tree doesn't have to be a full binary tree.

Heap

  • A special tree-based data structure.
  • A complete binary tree.
  • All nodes in the tree follow the property that they are greater than their children i.e. the largest element is at the root and both its children and smaller than the root and so on. Such a heap is called a max-heap. If instead all nodes are smaller than their children, it is called a min-heap.

Problems:

  • Find if one B-tree contains another B-tree.
  • Perform in-order traversal without recursion.
  • Perform pos-torder traversal without recursion.
  • Perform pre-order traversal in a given binary tree.

Graph

  • Detect a cycle in a graph.


Design questions

OOP

  • Designing a simple card game.
    • How to store the order of all the players (data structures)
    • How to break it up into classes (card, game, player, etc)
    • Create properties/fields and functions/methods.

Database Design

Concepts

  • ACID
    • Atomicity: Transactions are often composed of multiple statements. Atomicity guarantees that each transaction is treated as a single "unit", which either succeeds completely, or fails completely: if any of the statements constituting a transaction fails to complete, the entire transaction fails and the database is left unchanged. An atomic system must guarantee atomicity in each and every situation, including power failures, errors and crashes.

    • Consistency: ensures that a transaction can only bring the database from one valid state to another, maintaining database invariants: any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof. This prevents database corruption by an illegal transaction, but does not guarantee that a transaction is correct. Referential integrity guarantees the primary key - foreign key relationship.

    • Isolation: Transactions are often executed concurrently (e.g., multiple transactions reading and writing to a table at the same time). Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially. Isolation is the main goal of concurrency control; depending on the method used, the effects of an incomplete transaction might not even be visible to other transactions.

    • Durability: guarantees that once a transaction has been committed, it will remain committed even in the case of a system failure (e.g., power outage or crash). This usually means that completed transactions (or their effects) are recorded in non-volatile memory.

SQL Query

  • For DB tables:

    -- containers(id, name);
    CREATE TABLE IF NOT EXISTS containers (
      `id` INT NOT NULL AUTO_INCREMENT COMMENT 'PK for containers',
      `name` VARCHAR(45) NULL COMMENT 'container name',
      PRIMARY KEY (`id`)
    );
    
    -- items(id, type);
    CREATE TABLE IF NOT EXISTS items (
      `id` INT NOT NULL AUTO_INCREMENT COMMENT 'PK for items',
      `type` VARCHAR(45) NULL COMMENT 'Item type',
      PRIMARY KEY (`id`)
    );
    
    -- containeritem(container_id, item_id);
    CREATE TABLE IF NOT EXISTS containeritem (
      `container_id` INT NOT NULL COMMENT 'Composite PK: FK to containers.id',
      `item_id` INT NOT NULL COMMENT 'Composite PK: FK to items.id',
      CONSTRAINT `FK__containeritem_container_id`
        FOREIGN KEY (`container_id`)
        REFERENCES `containers` (`id`),
      CONSTRAINT `FK__containeritem_item_id`
        FOREIGN KEY (`item_id`)
        REFERENCES `items` (`id`)
    );

    Write a SQL query to list all container names, item types, and count of types (including zero) for each container.

    SELECT d.name, d.type,
      CASE WHEN ci.item_id IS NULL THEN 0 ELSE count(ci.item_id) END as 'count'
      FROM (
      SELECT c.id as 'c_id', c.name, i.id as 'i_id', i.type
        FROM containers c
       CROSS JOIN items i
      ) d  -- optionally to use CTE
     LEFT OUTER JOIN containeritem ci ON d.i_id = ci.item_id
      AND d.c_id = ci.container_id
     GROUP BY d.name, d.type
     ORDER BY d.name, d.type
    ;

    See http://sqlfiddle.com/#!9/20fa2b/52

System Design

Checklist

  • A.B.C.D.

    • **A**sk questions (what features? how much to scale? ...)
    • Don't use **b**uzzword
    • **C**lear and organized thinking
    • **D**rive discussion :_**)
  • Questions to ask

    • Features (scope)
    • Define APIs (who's calling the interface and how)
    • Availability (host/site down)
    • Latency performance (faster with cache for customer-facing service)
    • Scalability (from hundreds of users to thousands, millions, ...)
    • Durability (secured without data loss nor being compromised)
    • Class diagram
    • Security and privacy
    • Cost effective
    • Usability
  • Design philosophy

    • Scalability

      • vertical (more cpu, mem, storage on the same host, has limitation)
      • horizontal (adding more hosts)
    • CAP theorem A distributed database system can only have 2 of the 3:

      • Consistency: every read receives a recent write or an error. (note difference in ACID)
      • Availability: every request receives a (non-error) response - without guarantee that it contains the most recent write.
      • Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.
        2 options on partition failure:
        • Cancel the operation to decrease the availability but ensure consistency.
        • Proceed with the operation and thus provide availability but risk inconsistency.
        • Note: such trade-off only apply on network partition or failure.
    • ACID vs BASE

      • BASE (Basically Available, Soft state, Eventual consistency), a consistency model used in distributed computing to achieve high availability, with (weak) guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value.
    • Strong vs eventual consistency

      • Database systems designed with traditional ACID guarantees in mind such as RDBMS choose consistency over availability, whereas systems designed around the BASE philosophy, common in the NoSQL movement for example, choose availability over consistency.
    • Relational database vs NoSQL

      • RDMS
      • NoSQL types:
        • key/value pairs
        • wide column
        • document based
        • graph based
    • Partitioning/sharding data

      • Partitioning performs division of a logical database or its constituent elements into distinct independent parts.
      • Sharding, a horizontal partitioning, is the act of taking a data set and splitting it across multiple machines.
      • Consitent Hashing is a method/algorithm to determine which machine any particular piece of data goes to.
    • Optimistic vs passimistic locking

    • Cache policy

    • Datacenter, racks, hosts

    • CPU, memory, hard drive, and network bandwidth (limited resources)

    • Random vs sequential read/write on disk

  • Concepts in IT

    • http vs http2 vs websockets
    • tcp/ip model (vs osi 7 layers), tcp (document) vs udp (stream/video)
    • ipv4 (32-bit) vs ipv6 (128-bit)
    • dns lookup
    • https, TLS, public key infrastructure and certificate authority (CA)
    • symmetric (e.g. AES) vs asymmetric (public/private key) encryption
    • load balancer: L4 () or L7
    • edges server (e.g. CDN)
    • space-efficient probabilistic data structure
    • consensus over distributed hosts (Paxos)
    • design patterns and object oriented design
    • virtual machine and containers
    • publisher-subscriber over queue (pub-sub architecture)
    • multithreading, locks, synchronization, CAS (compare and swap)
    • map reduce
  • Technologies

    • Cassandra (wide column, high scalability)
    • MongoDB / Couchbase
    • Memcached, Redis (distributed cache)
    • Zookeeper (centralized configuration management, all in mem, scale for reads, but not for writes)
    • Kaffa (fault tolerant high available queue, deliver message once, keep in order)
    • HAProxy

Example: Find the peakest time

From a long list of trip time (with Start and End time, sorted by Started), e.g. within a whole month, find the peakest trip time.¶

type Trip struct {
  Start time
  End time
}

// assuming the list (e.g. a whole month/year) is sorted by start time
func countPeakTrips(a []Trip) int {

}

Example: Design a stock data system

  • requirements
    • get real time price
    • get historic trend
  • data schema and database
  • misc consideration

Resources:


Mathematics

Big prime number

Resources


Sorting

Big-O notation

Algorithm Best Average Worst Space
Bubble-sort n n^2 n^2 1
Bucket-sort n+k n+k n^2 n
Cube-sort n n log(n) n log(n) n
Heap-sort n log(n) n log(n) n log(n) 1
Insertion-sort n n^2 n^2 1
Merge-sort n log(n) n log(n) n log(n) n
Quick-sort n log(n) n log(n) n^2 log(n)
Radix-sort n*k n*k n*k n+k
Selection-sort n^2 n^2 n^2 1
Shell-sort n log(n) n log(n)^2 n log(n)^2 1
Tree-sort n log(n) n log(n) n^2 n


Bubble Sort

def bubble_sort(arr, reversed_order=False):
    x_range = range(len(arr)-1, 0, -1)
    for num in x_range:
        for i in range(num):
            is_upper = not reversed_order and arr[i] > arr[i+1]
            is_lower = reversed_order and arr[i] < arr[i+1]
            if is_lower or is_upper:
                arr[i], arr[i+1] = arr[i+1], arr[i]


Insertion Sort

def insertion_sort(arr):
   for ndx in range(1, len(arr)):
     pos = ndx
     val = arr[ndx]
     while pos > 0 and arr[pos-1] > val:
         arr[pos] = arr[pos-1]
         pos = pos-1
     arr[pos] = val


Selection Sort

def selection_sort(arr):
    x_range = range(len(arr)-1, 0, -1)
    for num in x_range:
        pos = 0
        for ndx in range(1, num+1):
            if arr[ndx] > arr[pos]:
                pos = ndx
        arr[num], arr[pos] = arr[pos], arr[num]


Merge Sort

def merge_sort(arr):
    if not isinstance(arr, list) or len(arr) <= 1:
        return
    mid = len(arr) // 2
    hal, har = arr[:mid], arr[mid:]  # copy to new slices

    merge_sort(hal)  # sorting the left half
    merge_sort(har)  # sorting the right half

    i = j = k = 0
    # copy data to new slices `hal` and `har`
    while i < len(hal) and j < len(har):
        if hal[i] < har[j]:
            arr[k], i = hal[i], i+1
        else:
            arr[k], j = har[j], j+1
        k += 1
    # checking if any element was left
    while i < len(hal):
        arr[k], k, i = hal[i], k+1, i+1
    while j < len(har):
        arr[k], k, j = har[j], k+1, j+1


Quick Sort

def _check_median_pivot(arr, li, hi):
    mi = (li + hi) / 2
    lv, lv_is_num = _get_num(arr[li])
    mv, mv_is_num = _get_num(arr[mi])
    hv, hv_is_num = _get_num(arr[hi])
    if mv_is_num and (not lv_is_num or mv < lv):
        arr[li], arr[mi] = arr[mi], arr[li]
    if hv_is_num and (not lv_is_num or hv < lv):
        arr[li], arr[hi] = arr[hi], arr[li]
    if mv_is_num and (not hv_is_num or mv < hv):
        arr[mi], arr[hi] = arr[hi], arr[mi]

def _get_num(v):
    try:
        v = float(v)
        v = int(v)
    except Exception:
        pass
    return v, isinstance(v, (int, float))

def _get_pivot(arr, li, hi, median_pivot=False):
    if median_pivot:
        _check_median_pivot(arr, li, hi)
    si = li  # index of smaller element, starting at lower index
    pivot, pivot_is_number = _get_num(arr[hi])  # initial pivot
    for xi in range(li, hi):
        v, v_is_number = _get_num(arr[xi])
        # moving non-number to higher position
        if v_is_number and (not pivot_is_number or v < pivot):
            arr[si], arr[xi] = arr[xi], arr[si]
            si = si + 1
    arr[si], arr[hi] = arr[hi], arr[si]
    return si

def quick_sort(arr, li=0, hi=None, level=0):
    if not arr or not isinstance(arr, list):
        return
    sz = len(arr)
    li = li if isinstance(li, int) and 0 <= li and li < sz else 0
    hi = hi if isinstance(hi, int) and 0 <= hi and hi < sz else sz-1
    if li < hi:
        pivot = _get_pivot(arr, li, hi)
        # print('sorting {}, pivot {} from {} to {}'.format(arr, pivot, li, hi))
        quick_sort(arr, li, pivot-1, level+1)
        quick_sort(arr, pivot+1, hi, level+1)


Quiz, IQ, and Brain Teaser

  • Calculate the angle at any specific clock time.

  • From 10 boxes of balls, with one box contains 9g balls and other contain 10g balls. How to weigh one time and find the 9g-ball box?

  • From 12 identical looking balls, use a simple mechanical balance 3 times, how to find the only-one fake ball (which could be lighter or heavier)?

  • Four people, sharing 1 torch, need to cross a bridge can only support maximum 2 persons. Each speed at 1m, 2m, 5m, and 8m, can all pass in 15-minute?

  • Only 2 cuts allowed, paying one gold bar to a worker in 7 days (1 week).

  • Use 3 gallon jug and 5 gallon jug, measure out exactly 4 gallons.

  • https://blog.codinghorror.com/classic-computer-science-puzzles/


Software Testing

Test categories:

  • spec test
    • positive test
    • negative test
  • edge test cases, including out-of-range
  • type check
  • branch check
  • exception check
  • run time error

Advanced:

  • usability test
    • user interface and UX (user experience) - (consumer) product
    • UI and readability (easy to read/understand)
    • spec/doc, or user manual/guide
  • performance test (load test / stress test)
  • security test: (password exposure, input and logging, wired clear text, data loss, data integrity)
  • risk test
  • localizations: (non-ascii, unicode, different language documentation, user interface)
  • integration
    • installation and setup
    • environment (os system/platform, cloud platform)
    • ad hoc / smoke test / acceptance test / alpha / beta
    • accessibility test, compatibility test
    • work with other services/products


Networking

Data Transfer and Latency

Computer Event Speed Avg Latency Normalized Scale
3 GHz CPU 0.3 ns 1s
Cache (L1) 3.5GB/s 0.9 ns 3s
Cache (L2) 3GB/s 2.8 ns 9s
Cache (L3) 12.9 ns 43s
RAM 2GB/s ~100 ns 4m
SSD 200MB/s 25~150 μs 1~4 days
Hard Drive 150MB/s 1~10 ms 1~9 months
Internet SF-NYC 65 ms 5 years
Internet SF-Hong Kong 141 ms 11 years

Online network tools

Downloadable tools

Resources

Free/public APIs


Non-technical


Reading


» Back to Contents | Home « Dockerian