(S)am's insights2024-03-16T17:04:19+00:00http://sam-saarinen.github.io/insights(S)amOrganizational Code Management - Scaling Across Projects and Teams2024-01-23T00:00:00+00:00http://sam-saarinen.github.io/insights/2024/01/23/organizational-code-management<p><strong>tl;dr:</strong> communication between people / contextual overhead rapidly bottlenecks scaling codebases; there’s not a clear best solution, but there are some decent options.</p>
<p>Over the last 7 years, I’ve revisited this question quite a bit. How do I best organize people and code to create efficient processes for sharing work across project scopes? There are tradeoffs among the options used by some of the most prolific organizations out there, and it seems that refusing to have an opinion is worse than most of those options. But why is this a problem and how does it change as the organization grows?</p>
<p>Although I’ve posted this online for public benefit, this is really just a bikeshedding document so I don’t keep spending time on this question.</p>
<h2 id="introduction">Introduction</h2>
<p>When working as a solo developer on a single project, it takes a while before project scope requires being really disciplined about code organization. A lightweight folder system that taxonomizes source types (e.g. static assets vs. code) and groups functionally-related files (e.g. source files supporting the same feature or tool) goes a <em>really</em> long way. With that said, even as a solo developer, there are times when project subcomponents are deployed to separate endpoints (e.g. frontend and backend, or scripting layer and low-level procedural layer). Redundancy is generally to be avoided (DRY - Don’t Repeat Yourself) because instances that should be updated simultaneously often end up decoupled, leading to a proliferation of preventable bugs and delaying rollout of fixes. Redundancy also creates a small amount of overhead in the development process. Within the scope of a single project with a single deployment mechanism, source code can usually be refactored to avoid redundancy (usually worth the effort after you’ve manually solved the problem 3 times) and imported in multiple places. But when there are multiple build paths, it can become difficult to share and synchronize functionality or definitions between, for example, frontend and backend interfaces.</p>
<p>When there are multiple projects and/or multiple teams, these kinds of problems rapidly compound. Projects may benefit from shared code, but (especially when different clients or codebase owners are involved), shared code needs to be scoped separately from their containing projects. When multiple teams are involved (especially cross-functionally), there’s also a problem of code discoverability. How can each team know whether the thing they need already exists (even approximately) across the organization’s owned codebases?</p>
<p>This is the problem of organizational code management. Some common solutions are outlined in each of the following sections.</p>
<h2 id="option-0-each-team-for-itself">Option 0: Each Team for Itself</h2>
<p>This is the de facto strategy for most small-to-mid-size organizations, and it works reasonably well when organizations have less than 100 people. Generally, people know the roles of everyone else in the organization (or can find out quickly), and they can figure out who would own certain functionality if it existed. Group communication tools like Slack also make it possible for parallel roles (e.g. developers on different sub-teams) to communicate efficiently. As organizations scale, tools like StackOverflow help to propagate standard answers to common problems, and good ideas tend to survive and be imitated. This requires relatively little direct management (culture around documentation and question-answering requires a small amount of reinforcement), and it works well when individual teams can bear final responsibility for software functionality (e.g. there’s not a lot of company-wide liability for feature-specific bugs).</p>
<p>This is also the de facto organization for most of the open source community. Although large projects are sometimes moderated and have coordinated sub-teams, they are often organized around a single delivery mechanism and benevolent dictators can prioritize project maintainability over speed of delivery (since there’s rarely a revenue incentive for open-source projects). Across the myriad projects of the non-organized public, everyone makes their own decision and successful projects survive (evolutionarily) to inspire imitation.</p>
<h2 id="option-1-complete-encapsulation---microservices">Option 1: Complete Encapsulation - Microservices</h2>
<p>Amazon famously has small teams that build its web services as self-contained functionality that’s accessed via an API (even internally, or so we’re told). This works very well when services can be decoupled effectively, and it makes it easy to track within-organization usage and impact. Downsides are that it’s hard to encourage or enforce best practices across teams, so rotating new personnel onto teams introduces more onboarding overhead. It’s also easy to end up with non-obvious redundancy when functionality that’s too small or abstract to be easily marketed across teams ends up being recreated. Finally, there’s a non-negligible overhead to requiring all services to be invoked via web APIs (rather than direct</p>
<h2 id="option-2-shared-functionality---package-management">Option 2: Shared Functionality - Package Management</h2>
<p>npm, pip / conda, and many other language-specific package managers exist specifically to enable packaging of source code that can be easily shared across projects and imported into new source. Many of these package managers also support private packages. This is a great solution when all of the code is written in one language and when code ownership is a strong organizational principle. Downsides are that this makes it hard to reuse or adapt code that does <em>almost</em> the right thing (compared to having access to the source code), and it generally requires strong documentation practices and within-organization discoverability.</p>
<h2 id="option-3-shared-code---git-submodules">Option 3: Shared Code - Git Submodules</h2>
<p>Git Submodules allow distributed teams to propagate changes back to source modules with easy testing in-context. This is a big advantage when the same people or teams are working on multiple projects or packages, or when some of the packages are communal or experimental. It’s also much easier for submodule consumers to debug with complete access to the source. In most other ways, this is less convenient than package management as it induces additional workflow steps to keep submodules updated.</p>
<h2 id="option-4-complete-integration---monorepo-monolithic-repositories">Option 4: Complete Integration - Monorepo (monolithic repositories)</h2>
<p>Several large organizations (reportedly, Google) use essentially a single repository. This has enormous benefits from a dev ops point of view when it comes to enforcing organization-wide code quality and style, managing codebase security, and streamlining deployment pipelines. Major drawbacks include the logistical overhead of dealing with unwieldy large codebases and the overhead of engineering QA on prototyping processes. When the whole code base is available, however, searchability slightly improves cross-organization discoverability.</p>
<h2 id="option-5-strategic-hybrids">Option 5: Strategic Hybrids</h2>
<p>In practice, I think most CTO’s / Dev Ops VP’s / managers agree that the optimal option is probably context-specific. Trying too many strategies at once can lead to organizational confusion and can quickly devolve into high-overhead chaos. But there’s probably a sensible project lifecycle that involves moving projects between code management strategies as the project maturity and personnel structure change. Feasibility prototypes and viability experiments can afford to be scrappy and undisciplined because the bottleneck is not maintenance cost but speed of experimentation. Long-term projects with rotating personnel should be explicitly managed with a strategy that makes sense for the level of encapsulability of the project and its components.</p>
<p>For my team (still small), we’re experimenting with Git Submodules as a way of sharing type defintitions and utility code across projects and deployment end points (that the same people are working on). As we mature as an organization, we’ll likely create more packages (and open-source many of them, to encourage interoperable community-driven development). I’m skeptical of the value of monorepos in multifaceted organizations with active experimentation. Education is also a relatively low-risk deployment domain. I expect that semantic search tools will continue to improve and will ameliorate many of the discoverability issues that affect all of these models.</p>
<p>Thanks,<br />
- (S)am</p>
[List] Games I Recommend2023-11-08T00:00:00+00:00http://sam-saarinen.github.io/insights/2023/11/08/games-i-recommend<p>Games are distinct from other media, and unique in their ability to communicate about systems, and to engage the player personally. Games can make players feel accomplishment, remorse, or empathy in ways that other media struggle to capture. Electronic games are generally more expressive than physical games, but I’ve listed some of each below.</p>
<h2 id="electronic-experiences">Electronic Experiences</h2>
<h3 id="braid">Braid</h3>
<h3 id="star-wars-knights-of-the-old-republic">Star Wars: Knights of the Old Republic</h3>
<h3 id="the-legend-of-zelda-breath-of-the-wild">The Legend of Zelda: Breath of the Wild</h3>
<h2 id="physical-experiences">Physical Experiences</h2>
<h3 id="contact">Contact</h3>
<h3 id="liars-poker">Liar’s Poker</h3>
<h3 id="ninja">Ninja</h3>
<hr />
[List] Movies (and Series) I Recommend2023-11-06T00:00:00+00:00http://sam-saarinen.github.io/insights/2023/11/06/movies-i-recommend<p>It’s difficult to compare movies across genres, and there are movies that have relevance to the history of the medium that just aren’t as enjoyable to watch now. With that said, these are some movies that I would generally recommend to people <em>today</em>. I’ve also mixed in a TV series or two. I’ve generally recommended things that I think are <em>worth</em> watching, not just enjoyable to watch. This list will be updated periodically as I change my mind and as I see more movies. There are many excellent movies that are not on the list.</p>
<h3 id="the-incredibles-2004">The Incredibles (2004)</h3>
<p>If forced to pick a favorite, this might be it. An allegory for middle-class America wrapped up in comic-book superheroism. An object lesson in irony. And so many quoteable one-liners! The only thing I might fault is that the animation will continue to age as computers and art advance.</p>
<p>If I wrote down everything I loved about this movie (and its quite good sequel), I suspect I would lose most of my audience. But here are just a few:</p>
<ul>
<li>Mr. Incredible (Bob) creates his own nemesis (Buddy), sparks his later crisis of identity through his inability to choose between heroism and his personal life, and is ultimately redeemed through his character growth (rather than insisting on doing everything alone, he overcomes the challenges he created with his family).</li>
<li>Syndrome (Buddy) precipitates his own undoing (Mirage, the robot), through his obsession with recognition. The pride before the fall.</li>
<li>There is a massive amount of ironic prescience in the movie: “He started monologuing!”; “Let me guess, it got smart enough to wonder why it was taking orders.”; and of course, “No capes!”</li>
<li>In each of their ways, the Parr children struggle with being exceptional. Dash and Syndrome share almost identical lines - “When everyone’s special, then no one will be.”</li>
<li>Bomb Voyage!</li>
<li>The Underminer! “I am always beneath you, but nothing is beneath me!”</li>
</ul>
<h3 id="avatar-the-last-airbender-2005-2008">Avatar: The Last Airbender (2005-2008)</h3>
<p>The series definitely has its rough spots (a few noticeable animation loops and they were still finding their rhythm during the first 2-3 episodes), but this is hands-down my favorite series, and as valuable as any of the movies on this list. Phenomenal writing, and many things to think about. What’s the real cost of war? Are leaders called to sacrifice their own morality for the good of the people (a classic deontological/consequentialist conundrum)? Is there anywhere where cabbages will be safe?</p>
<h3 id="gandhi-1982">Gandhi (1982)</h3>
<p>This movie gives me hope. This movie is how I first learned about the historical person of Gandhi, and had an enormous effect on how I conceive of my life’s work and value. “There are many causes for which I would be willing to die, but there is no cause for which I would be willing to kill.”</p>
<p>Is it possible to change the world without acts of violence? I sure hope so. Maybe this wouldn’t be possible without the free press, global trade, and a kind of public moral authority. Maybe it’s not the way to solve challenges like global terrorism. But I’m good to keep trying to figure out how it can be.</p>
<h3 id="spider-man-across-the-spider-verse-2023">Spider-Man: Across the Spider-verse (2023)</h3>
<p>Although the prequel, Into the Spider-verse, was excellent, this movie was phenomenal. Miles Morales comes into his own (it’s a coming of age story), fights his fate (it’s a sci-fi story), fights his past (it’s a super-hero story), and fights his friends (it’s a good story). Animation is used expressively in ways that photorealistic CGI has not been (I liked the augmented-reality-esque animation in Miss Marvel for similar reasons), and I love the way color and texture are used in the scenes with Gwen’s father. The movie is well-situated in its own cultural context, acknowledging the myriad of Spider-Man media that came before while also reacting against it. Even though there’s clearly going to be a third movie in the series, this movie feels like a complete arc in its own right, as Miles decides for himself who he’s going to be.</p>
<h3 id="interstellar-2014">Interstellar (2014)</h3>
<p>As I was walking into the movie, I told my friends I didn’t want to get too optimistic; hard sci-fi movies often have a surprising number of unnecessary technical inaccuracies, such as failing to visualize gravitational lensing in their stellar views near black holes. After a brief stint in the movie, I was willing to suspend my disbelief and trust the writers.</p>
<p>With that said, there are still all manner of problems with the movie, mostly in terms of things characters should have known or anticipated, but didn’t. The climax has also been controversial for viewers for whom “sufficiently advanced technology” is indistinguishable from “magic”. But this makes my list of recommended movies for its use of physical truths to drive allegorical discussions of the human condition. “Newton’s Third Law: The only way we’ve found of getting anywhere is to leave something behind.”</p>
<h3 id="the-dark-knight-2008">The Dark Knight (2008)</h3>
<p>Can a hero sacrifice someone else for the greater good?</p>
<h3 id="groundhog-day-1993">Groundhog Day (1993)</h3>
<p>I’m a sucker for time travel (which is a great literary tool for exploring themes of regret, fate, mortality, and choice, but it’s easy to botch). Groundhog Day is a delightful mix of surprising applications of an unwanted superpower, low-brow slapstick, and profound examination of the human condition. Probably my favorite dialogue is when Rita says, “I could never love someone like you. You only love yourself.” And Phil replies, “That’s not true. I don’t even like myself.”</p>
<p>In the time-travel vein, I also like Next (2007), because it raises some implications (can time-travel solve NP-hard problems?) and uses some creative cinematography. If I didn’t have either of those on the list, I would probably include Tenet (2020).</p>
<h3 id="2001-a-space-odyssey-1968">2001: A Space Odyssey (1968)</h3>
<p>This movie is odd. I did not like it the first time I saw it, and I didn’t like it the second time I saw it. The third time I saw it (somewhat older, and having heard some commentaries), I started to develop an appreciation for it as a work of art. I wouldn’t call this “casual viewing”, but I do recommend it to anyone who’s looking for a take on what drives growth of consciousness. My current take is that the movie’s answer is “contemplation of the unknown”, although I think “mutual battles for survival” is probably equally defensible; I just don’t want it to be true.</p>
<h3 id="the-black-panther-2018">The Black Panther (2018)</h3>
<p>“You are a good man. And it is hard for a good man to be king.”</p>
<p>This movie has such a well-constructed plot, literary irony, and social commentary. I love the ancestral plane sequences; to me, those are the heart of the movie. Who are you (where do you come from), and what will you do with power?</p>
<h3 id="the-black-phone-2021">The Black Phone (2021)</h3>
<p>I don’t watch many horror movies (although Alien is something of a classic), and I rarely watch R-rated movies, but I made an exception for my sister’s birthday. This movie feels really well constructed; from a story-telling point of view, the twists and resolution feel both surprising and earned. I also like that the supernatural elements of the movie can be interpreted literally or experientially, but they’re an essential part of the storytelling. It’s rare to see that kind of ambiguity pulled off.</p>
<h3 id="gravity-2013">Gravity (2013)</h3>
<p>Is this a survival thriller set in outer space, a human story about grief and isolation, or a broader allegory for the human condition in face of crisis? Why not all three? This movie won a bunch of awards, and deservedly so.</p>
<hr />
[List] Actual 'Life Hacks'2023-10-31T00:00:00+00:00http://sam-saarinen.github.io/insights/2023/10/31/life-hacks<p>Last Updated: 2023-10-31</p>
<p>I haven’t found lists of “life hacks” to be super helpful in general — I think because their practicality and surprisingness are often context/person dependent. But there are a few that I’ve personally gotten a lot of mileage out of, and I’m putting this list together in hopes of maybe saving the next person some time. I’ve tried to organize them by category. To distinguish these from other lists I write, these are generally not about a tool, per se, but are about non-obvious uses/choices of items to solve minor but regularly ocurring problems.</p>
<h3 id="food">Food</h3>
<ul>
<li>Most chip bags can be kept airtight without a bag clip with some careful folding. Roll the flattened top down. fold the sides in a small amount horizontally, and then invert the folds. It also makes a nice handle.</li>
<li>To make chips easier to reach and serve, open the top, then gently roll/invert the chip bag from the bottom. If you’ve done it correctly, it should create a free-standing bowl with chips available at the very top. Push the bottom further in as necessary.</li>
<li>Square cocktail napkins can be laid in a classier helix by gently turning your knuckles on the top of the stack.</li>
</ul>
<h3 id="clothing">Clothing</h3>
<p>Unfortunately, this category is probably most relevant to men. I mean no disrespect to anyone who doesn’t wear men’s clothing, I just doubt that I’ll have helpful tips for you in this category.</p>
<ul>
<li>Bowties are more practical than ties. Not that neckwear or collars are particularly practical to begin with, but bowties require less work to keep clean and stay tidy.</li>
<li>There’s more than one way to tie a necktie. My personal favorite knots are the “trinity” and the “half-windsor”. When I was younger (and bowties weren’t a personal brand), I would regularly mix it up in order to stay noticeable in rooms full of much older professionals.</li>
<li>When pushing sleeves up, rather than repeatedly rolling the cuff, fold/invert the sleeve all the way from your wrist to the middle of your upper arm, then fold from the edge (now just below your elbow) to the same spot on your upper arm. This generally produces crisply shaped cuffs at the right position above your elbow, and it’s much easier to do and undo as the situation requires (moving from an overheated lecture hall to a wintry exterior, for example).</li>
</ul>
<h3 id="email">Email</h3>
<ul>
<li>Auto-archive / filter emails with the word “unsubscribe”.</li>
<li>Automatically label and de-inbox newsletters and other recurring messages that don’t generally require responses.</li>
</ul>
<h3 id="personal-management">Personal Management</h3>
<ul>
<li>For some people, a lack of ideas (stemming from a lack of creativity or lack of ambition) is a problem. For me, the problem is always having more opportunities than time. I’ve had many people say things like “execution is more important than the idea”, but the phrase that reminds me that prioritization is the simplest optimization is, “<em>Speed comes from focus.</em>”</li>
</ul>
<h3 id="underrated-purchases">Underrated Purchases</h3>
<ul>
<li>The Tub Shroom is a small, inexpensive object that sits in the drain of a bathtub and discreetly catches hair (for long-haired individuals, this is a common source of clogged shower drains). I haven’t found any comparable option that’s as easy to clean as their rubber version.</li>
</ul>
<hr />
[List] Books on Doing Good2023-06-06T00:00:00+00:00http://sam-saarinen.github.io/insights/2023/06/06/Books-on-Doing-Good<p>Last Updated: 2024-03-16</p>
<p>This is a page (that will be updated regularly) of books that I’ve read (and at least partially recommend) on how to do good in the world. While there may be some books that stray into the philosophical and ethical, most of them will be concerned more with the “how” than the “why”. Most of the books on list list I can’t recommend in my general reading list as they are audience/intent specific, but for me, reading them has been an essential part of my work.</p>
<h3 id="invention-and-innovation-a-history-of-hype-and-failure-by-vaclav-smil"><em>Invention and Innovation: A History of Hype and Failure</em> by Vaclav Smil</h3>
<p>Smil explores (through detailed historical anecdote) our tendency to overestimate the impact and benefit of the new. I have a difficult time recommending this book wholesale as I think the main argument of the book isn’t necessarily proven (in a logical sense) by the 9 primary examples carefully selected and exposited by the author, but the awareness of constraints and nuance that he brings are most certainly useful. Some of the main ideas from the book:</p>
<ul>
<li>Naive belief that all of our problems will be solved by a miracle technology in the next 10, 20, or 50 years typically ignores fundamental physical constraints, the cost and complexity of engineering after a fundamental discovery, the history of work on the problem already, and the economic and political realities the prospective technology will interact with.</li>
<li>In the case of climate change, in particular, the decarbonization goals set by the UN (and its member countries) are optimistic indpendent of the availability of new inventions. As an example, electrifying transportation by the dates set will require more electric vehicles to be produced each year than all combined vehicle production in any year prior.</li>
<li>Many hard problems require holistic solutions, involving collective changes in more than one area.</li>
<li>Even incredible advances in computational technology and AI don’t trivialize problems bound by physical constraints - e.g. the search for better refrigerants, where nearly all feasible molecules have been explored, or in-air transportation speeds, where speed and efficiency are directly at odds.</li>
<li>Real tradeoffs between different forms of wellbeing among different people have to be navigated, and industrial-scale mass-application of technologies often have unintended consequences, sometimes both long-lasting and undetectable for a very long time.</li>
<li>We tend to overstate the significance of individual discoveries.</li>
</ul>
<p>I don’t agree with all of the conclusions Smil draws, but I find his sober (and counter-culturally concrete) evaluation of technological progress to be quite helpful in rebalancing my expectations for the future.</p>
<h3 id="the-power-to-get-things-done-whether-you-feel-like-it-or-not-by-seve-levinson-phd-and-chris-cooper"><em>The Power to Get Things Done (Whether You Feel Like It or Not)</em> by Seve Levinson, PhD, and Chris Cooper</h3>
<p>This is more or less a self-help book focused on “Follow-Through”, but is informed by both a clinical and consulting background. The punchline of the book is essentially that motivation is temporary and unreliable, so it’s important to use moments of motivation to change your future circumstances so that you’ll do the right thing when the time comes. One of my favorite examples from the book was a business executive who hated going to the gym, so he decided to keep his deodorant in his rented gym locker. Not having any at home, he would have to physically go to the gym before work anyway or risk the embarrassment of growing smelly. Leveraging your weak motivation into greater follow-through requires some creativity and self-knowledge, but common strategies include involving other people, creating consequences (social or practical), making non-compliance impossible (and removing temptations), and replacing “achievement” goals with “showing up” goals.</p>
<p>In the context of doing good, my main takeaway is that there’s strong evidence from psychological research that my good intentions won’t result in doing good. I can’t personally conceive of the true magnitude of the problems we’re tackling — who can imagine millions of people suffering in a way that’s truly more felt than the suffering of one person that you know well? So I need to use my rare rational and sober-minded moments to trick, coerce, and manipulate my future self into doing the right thing by making it near-impossible for my future self to do otherwise.</p>
<h2 id="business-entrepreneurship-and-management">Business, Entrepreneurship, and Management</h2>
<p>This subsection is specifically on books that relate to starting, growing, and managing organizations. Organizations are powerful tools for increasing the scale of impact.</p>
<h3 id="zero-to-one-by-peter-thiel"><em>Zero to One</em> by Peter Thiel</h3>
<p>Peter presents his philosophy on what makes a good startup. Highly impactful startups are those that create something new (the number existing goes from “zero to one”), as opposed to those that repeat and refine what has come before (“one to many”). Starting a business is hard, and there’s more implicit competition for money, time, and attention than we realize. Startups shouldn’t delude themselves into thinking that a highly targeted and untapped market exists, and should explicitly focus in on a niche where they can provide a solution that’s at least 10x better than anything already available. A good indicator of a startup’s likelihood of success is its “monopoly potential”. Is there a reason why the company might be the “last mover” in a field? (Note: Peter interprets monopoly more generally/pragmatically than in legal practice. To him, Google has a monopoly on search, for example, despite the existence of plausible alternatives.) Peter suggests that the amount of value created and the proportion of that value captured by a company are largely independent. He argues that the greatest value is in creating new markets, which is also where the largest percentage of the created value can be captured (because it isn’t driven down by competition).</p>
<p>Peter also offers some practical wisdom on getting started. Founding teams matter in terms of their expertise, but even more so in terms of their ability to collaborate productively. Compensation and incentives need to induce long-term values alignment. The startup needs to view hiring as a core competency (it shouldn’t be outsourced), and needs to have a compelling reason for talented individuals to pass up higher-compensation opportunities. The company should outsource activities that are not central to its 10x advantage.</p>
<p>While I don’t think Peter’s rules have to be hard and fast, the premise that enduring companies start by rapidly capturing a niche market with a 10x advantage is a useful prioritization mechanism over possible directions and ideas. But this is only helpful to the extent that the identified niche can be reliably targeted and scalably serviced.</p>
<h3 id="the-lean-startup-by-eric-ries"><em>The Lean Startup</em> by Eric Ries</h3>
<p>I was put off from reading this book for many years by people who misquoted and apparently misunderstood its primary message. The book is old enough (published 2011) and influential enough that many of its ideas have become quite widespread, at least as memetic touchpoints. As far as I can tell, Eric coined the term “pivot” in the context of entrepreneurial strategy, popularized the idea of “validating” testable business “hypotheses”, and popularized the idea of a “minimum viable product”. A great irony is that Eric (who is an engineer, and suggests applying the engine of science to discovery of viable businesses) has had his ideas misrepresented by a great number of people without scientific training who use the vocabulary of science, but not its substance. (This is ironic because he explicitly warns against the development of an entrepreneurial pseudoscience near the end of his book.) When I finally got around to actually reading the book, it was revelatory.</p>
<p>The title of the book comes from the inspiration that he drew from Lean Manufacturing (initially driven by innovation at Toyota) in creating organizations that could rapidly discover a viable business under conditions of extreme uncertainty. He posits that for a startup, the primary measure of progress should be validated learning (the development of a theory about the customers, product, and market at large that inform testable improvements to the business), and he speaks against the procedural waste of working very hard to ultimately deliver a failing product. (A common misconception is that “lean” refers to the startup’s finances, not its production. The book is in fact, about minimizing wasted business development effort.)</p>
<p>Eric posits that every viable business is based on at least two premises (“hypotheses”). The <strong>value hypothesis</strong> is a prediction about a kind of product/service/activity/result that will be perceived as valuable by customers. The <strong>growth hypothesis</strong> is a prediction about a mechanism by which the number of customers engaging with the business will grow. The goal of a startup should be to develop validated theories for each of these questions by experimentally invalidating key assumptions as quickly as possible. Eric elaborates on key processes that help accelerate the “Build-Measure-Learn” cycle, allowing faster end-to-end learning on statistical samples of real customers, measuring real behavior.</p>
<p>One of the most valuable parts of the book was Eric’s elaboration on effective metrics for testing growth hypotheses. He identifies three kinds of business-driven growth: sticky, where users accumulate because they stay a long time, and where cohort-based retention/churn metrics are the most meaningful; viral, where the product itself drives engagement of new users, and where the “infection rate” is the key thing to measure; and paid, where revenue is spent directly on acquiring new customers, and where the cost of acquiring a customer is the key metric. The sticky/viral/paid ontology is artificial and imperfect — most businesses are composites — but the question of whether growth is being primarily driven by current users, the product, or revenue is very helpful in identifying which growth metrics should be prioritized.</p>
<p>For those new to Lean methods (or those with only secondhand exposure) - the illustrative analogies to other business types will also be very helpful. It is easy for individuals to feel they are being productive when they are engaging in their (individually) highest-value activity. But this leads to unseen or unowned waste (e.g. unusable inventory, engineering of products no one will buy). It’s critical to reorient teams around organizational success metrics that prioritize total system performance.</p>
<h3 id="the-goal-by-eliyahu-m-goldratt"><em>The Goal</em> by Eliyahu M. Goldratt</h3>
<p>The Goal is a business fable that focuses on the application of lean processes to modern manufacturing. After reading the Lean Startup, this (and the included article at the end, “Standing on the Shoulders of Giants”) gave me a much deeper understanding of the principles motivating lean system design. Where before, Kanban-based agile task planning (controlling the amount of “inventory” at any stage of completion) was incomprehensible to me, I now have a lot of clarity around when it is and isn’t beneficial and what problem it’s trying to solve.</p>
<p>Some of the big ideas of the book are that the thing that matters (the goal) should be defined at the whole system level, and often requires sacrificing the “efficiency” of system subparts. This is very counterintuitive to people whose view is on an individual component of the system (say a component, machine, worker, member, or employee). In fact, overproduction of non-rate-limiting parts (or completion of unnecessary tasks) is not just a form of waste, but often counterproductive (backlog build-up and resource activation slow down production along the critical path). The book explores a number of concepts and techniques that are abstract enough to be applied to a wide variety of operational tasks. Another major counterintuitive insight is that optimizing for throughput, or flow (minimizing total per-task production time) is generally much better for a business’ bottom line than focusing on operational cost savings.</p>
<p>When I think about how these lessons apply to education, I think there is a deep perspective shift that could unlock significant gains, but the lessons are difficult to apply as the relative complexity and ambiguity are much higher than in traditional manufacturing. To start with, what if the goal were to maximize throughput (say, end-to-end how quickly students mastered their defined curriculum) rather than to minimize operating costs (maximize learning per dollar spent)? This would be a major cultural shift for US Public Education and would require better ways of certifying educational completion (e.g. a more socially-promoted GED).</p>
<p>Maybe at some point I’ll update this post or my book with some follow-up thoughts on the implications.</p>
<h3 id="100m-offers-by-alex-hormozi"><em>$100M Offers</em> by Alex Hormozi</h3>
<p>This book doesn’t use tactful language, but it does present a useful abstraction over sales and consumer decision-making. Alex claims that the value consumers attribute to an offering is driven by four factors:</p>
<ul>
<li>The Dream Outcome (/Current Pain) - What is the gap (and how clearly envisioned is it) between the customer’s current reality and their end goal? The bigger the problem you solve, the more valuable the solution. Framing affects the value proposition a lot.</li>
<li>The Perceived Likelihood of Achievement - How confident is the customer that your solution will actually solve their problem? This is an especially important factor for startups, as they usually have low brand credibility, but they can improve purchase confidence by demonstrating evidence of past success, giving customers trials, and by offering guarantees.</li>
<li>The Perceived Time Delay between Start and Achievement - People value immediacy, often disproportionately. This also plays into the factor above and below.</li>
<li>The Perceived Effort and Sacrifice - Outside of the financial cost, there’s the effort of learning how to use the solution, the potential changes that will have to be made as a consequence of the solution, and the ongoing effort that remains. Streamlining purchase and onboarding processes has an enormous impact on the attractiveness of an offering.</li>
</ul>
<p>Notably, these four factors are all primarily perceptual (although they are hopefully based on real considerations), and startups tend to overindex on just one or two of these dials, missing opportunities to capture more of the real value they are offering.</p>
<h3 id="team-of-teams-by-general-stanley-mcchrystal-with-tantum-collins-chris-fussell-and-david-silverman"><em>Team of Teams</em> by General Stanley McChrystal (with Tantum Collins, Chris Fussell, and David Silverman)</h3>
<p>Stan McChrystal was the director of the US Special Task Force in the war on Al Queda in Iraq (AQI), and relates how the task force was transformed over the course of the war from a siloed organization overindexing on individual efficiencies to a responsive organization that optimized for end-to-end organizational efficacy. While the context of the war was illuminating (I was too young to know much about it when it was happening), the lessons for management tie in quite nicely with the takeaways from <em>The Goal</em> and <em>The Lean Startup</em>.</p>
<p>For a military organization, success isn’t measured directly in fiscal sustainability, but there is a regular operational cadence (Find a target, Fix its location, Finish, Exploit for intelligence, and Analyze to find new targets). In the early stages of the war, the siloed constituent organizations in the task force worked efficiently as isolated teams, but long or missed handoffs between teams, misaligned strategic priorities, and internal fragmentation of knowledge led to many missed opportunities and overlong cycle times, meaning that once a target in the terrorist network had been caputured, intelligence gathered from them was often no longer useful by the time it made it through the organization.</p>
<p>Similar to <em>The Goal</em> and <em>The Lean Startup</em>, a major theme is that organizations where individuals optimize for their own “efficiency” or “productivity” often lead to massive waste at an organizational level. If we think about the unutilized intelligence of the task force as inventory, and take <em>The Goal</em>’s target of maximizing throughput, organizational efficiency is all about minimizing the time between when intelligence is first collected and when it culminates in gathering more intelligence.</p>
<p>General McChrystal identifies two major thrusts that transformed the task force and led to a more than 17x increase in organizational throughput. The first was creating a “shared consciousness” — aggressively aggregating and sharing knowledge across the organization, redesigning physical spaces to encourage communication and transparency, and creating extensive rotational liaison programs to improve inter-team trust and functional understanding. The second major thrust was to empower distributed decision-making, reducing communication overhead and eliminating bottlenecks on in-situ responsiveness. McChrystal frames this as a major challenge to leadership to shift from being heroes to being gardeners. Where Taylor, Ford, and other business pioneers found system-level efficiencies by standardizing repetitive processes, many modern problems are now made difficult by their dynamism. It is responsiveness (or agility), not single-task efficiency, that determines the success of many modern organizations.</p>
<hr />
Restoring a Workstation2022-09-28T00:00:00+00:00http://sam-saarinen.github.io/insights/2022/09/28/computer-setup<h2 id="backstory">Backstory</h2>
<p>I have a nice computer. Like <strong>nice</strong>. Because I do a lot of machine learning for my work, I’m able to work more efficiently (and earn more) by taking advantage of a mid-tier GPU.
Unfortunately, my workstation crashed unexpectedly just as I was wrapping up two major projects and just before I temporarily relocated to Alabama for a Techstars accelerator program. It turns out the culprit was a failed boot sector on the primary hard drive (still not sure why it failed — maybe I just got unlucky). Anyway, I scrambled to get back to a working desktop environment as quickly as possible. Here are the steps I took.</p>
<h2 id="running-on-ubuntu">Running on Ubuntu</h2>
<ol>
<li>Use a small flashdrive with an Ubuntu live disk written to it to install Ubuntu on a larger flash drive that will serve as the temporary OS drive.</li>
<li>Boot off of the large flash drive and complete hardware configuration (monitors, input devices)</li>
<li>Install Brave Browser, set up sync codes, log into Google accounts in the correct order (to preserve bookmark account association)</li>
<li>Install VS Code</li>
<li>Install VS Code Extensions:
<ul>
<li>Auto Rename Tag, ESLint, GitLens, Prettier, Visual Studio IntelliCode, HTML CSS Support</li>
</ul>
</li>
<li>Install git and github cli, github login <code class="language-plaintext highlighter-rouge">gh auth login</code></li>
<li>Install node <code class="language-plaintext highlighter-rouge">sudo apt install nodejs</code>, install npm if not installed, nvm optional
<ul>
<li>-> <code class="language-plaintext highlighter-rouge">npm install n -g</code>, <code class="language-plaintext highlighter-rouge">sudo n stable</code></li>
</ul>
</li>
<li>Install firebase, firebase login <code class="language-plaintext highlighter-rouge">sudo npm install -g firebase-tools</code></li>
<li>Install Zoom</li>
<li>Install Piper for Mouse Settings
<ul>
<li>-> Use system settings dialog to remap keyboard shortcut for listing all applications (show the overview), because remapping the SUPER key doesn’t seem to work.</li>
</ul>
</li>
<li>Add Google Drive Accounts to Ubuntu Login</li>
<li>Log in to dropbox online</li>
</ol>
<h2 id="restoring-windows">Restoring Windows</h2>
<p>After the Techstars program concluded [this section is an edit to the original post], I had the breathing room to replace the hard drive and try to get my Windows environment working again. Here’s what I did:</p>
<ol>
<li>Write Recovery Disk ISO to flash drive (in Ubuntu, can use “Disks” program)
<ul>
<li>Note: may require the OEM option, or something. Needs some hardware drivers pre-installed?</li>
</ul>
</li>
<li>Install Windows on the new hard drive</li>
<li>Install Brave (Download using Edge)</li>
<li>Use NVIDIA GeForce Experience to install GPU Drivers
<ul>
<li>Install CUDA using the <a href="https://developer.nvidia.com/cuda-toolkit">NVIDIA Installer</a></li>
<li>Install CuDNN using the <a href="https://developer.nvidia.com/cudnn">NVIDA Local Install Link</a> or (or Python library nvidia-cudnn ?)</li>
</ul>
</li>
<li>Windows App Armoury Crate (automatically prompted) to control case lighting.</li>
<li>Download Mouse Software
<ul>
<li>Setup mouse profile (shortcuts for copy, paste, list active programs, backspace, enter)</li>
</ul>
</li>
<li>Add Google Drive Sync</li>
<li>Add Dropbox Sync</li>
<li>Install Anaconda
<ul>
<li>Use <code class="language-plaintext highlighter-rouge">conda</code> to uninstall and reinstall pytorch (with CUDA support) if necessary.</li>
</ul>
</li>
<li>Install VS Code
<ul>
<li>Set default tab spacing to 2 and line wrap to true.</li>
<li>VS Code Extensions: Auto Rename Tag, ESLint, GitLens, Prettier, Visual Studio IntelliCode, HTML CSS Support</li>
</ul>
</li>
<li>Install git (https://git-scm.com/) and Github CLI</li>
<li>Install node (https://nodejs.org/en/download/), then Firebase and Ionic (<code class="language-plaintext highlighter-rouge">npm install -g @ionic/cli</code>, <code class="language-plaintext highlighter-rouge">npm install -g firebase-tools</code>).</li>
<li>Install Steam (and Epic Games, and EA Launcher)</li>
<li>Install John’s Background Switcher</li>
<li>Install Zoom</li>
<li>Install Razer Kiyo Webcam Software (Razer Synapse)</li>
<li>Install Microsoft Teams</li>
<li>Install LibreOffice</li>
<li>Push these instruction updates to GitHub Pages</li>
</ol>
<h2 id="unnecessary-commentary">Unnecessary Commentary</h2>
<p>A lot of the work I do uses cloud software, and all of my essential files are backed up to the cloud (either via Google Drive, Dropbox, or GitHub), so most of what I have to install are OS- or hardware- -related software packages. Although I do occasionally play games on my desktop (game design is an academic hobby of mine), much of the gaming-related hardware and software I have fulfills business purposes. For example, the case has controllable RGB lighting, but I only got it because it provided the best options for ventilation and multiple fans. I also use the Logitech G600 mouse (originally intended for playing MMO’s, I think) because it was the mouse I found with the most buttons available. By setting the buttons to perform operations useful to editing code (and more broadly, document and file manipulation), I’m able to do a larger fraction of my work without switching back and forth between the mouse and keyboard. I’m still on the lookout for more efficient input tools.</p>
<p>Thanks,<br />
- (S)am</p>
[List] Books I Recommend2022-05-18T00:00:00+00:00http://sam-saarinen.github.io/insights/2022/05/18/Books-I-Recommend<p>This is a page (that will be updated regularly) of books that I recommend reading. Although expanding population, increasing literacy, improving ease of communication, and benefit of history would seem to cause (by simple probabilities) most of the best books of all time to have been written fairly recently, you will also find a few classics (of various ages) here.</p>
<h2 id="general-reading">General Reading</h2>
<h3 id="the-bible">The Bible</h3>
<p>Why do I recommend the #1 bestseller of all time? Outside of the possibility of profound spiritual connection, the Bible underlies an enormous portion of contemporary discussion of morality and philosophy. For example, the Bible contains the first recorded statement<sup id="fnref:GoldenRule" role="doc-noteref"><a href="#fn:GoldenRule" class="footnote" rel="footnote">1</a></sup> of The Golden Rule (“do to others as you would have them do to you” - Matthew 7:12), a description of totally altruistic compassion (1 Corinthians 13:4-7), and a call for the equality of rulers and common people under the written law (Deuteronomy 17:18-20). The Bible also reflects a broad array of genres in ancient literature, and tracks multiple threads of cultural history across centruries. That said, the Bible is not for the faint of heart; for readers in all regions of philosophical standing, distance from the culture and context of the original audience(s) turns many of the more opaque passages into massive exercises in confirmation bias on the part of the reader.</p>
<h3 id="les-misérables-by-victor-hugo"><em>Les Misérables</em> by Victor Hugo</h3>
<p>How do I describe Victor Hugo’s masterwork? Intricate, authentic, and emotional. I first encountered the story of Les Misérables through the award-winning musical, but catchy musical motifs struggle to capture the rich capacity of the book. Although Hugo’s observations on justice and government are anchored in his time, his portrayal of the diversity of human nature and the interwoven nature of society are timeless. One of my favorite moments is when Gavroche, a street urchin, witnesses (from concealment) Montparnasse, a young criminal, who attempts to mug Jean Valjean, an escaped and reformed criminal who appears to be a member of the gentry. Valjean overpowers Montparnasse, lectures him, and then gives him money freely. Gavroche steals the money from Montparnasse and leaves it for the nearby Father Mabeuf, an elderly man struggling to care for a friend of his. Mabeuf, an honest man, turns the unidentified money over to the police. In dire financial straits, he eventually dies in the midst of a populist revolution. In the end, none of the characters get what they wanted, nor what they deserved.</p>
<h3 id="the-three-body-problem-the-dark-forest-and-deaths-end-by-cixin-liu-chinese-刘慈欣-liú-cíxīn"><em>The Three Body Problem</em>, <em>The Dark Forest</em>, and <em>Death’s End</em> by Cixin Liu (Chinese: 刘慈欣 Liú Cíxīn)</h3>
<p>Cixin Liu’s award-winning and widely translated science fiction trilogy stands apart for its broad scope, examination of the intersection of technology and society in the context of a vast universe, and poetic reflection on human rationality in the face of irrational tragedy. I don’t agree (or at least, don’t want to agree) with the series’ representation of the strength of collective human envy as a self-destructive force, but I deeply appreciate the way that the events of the books are informed by (and in some cases directly driven by) the Chinese Cultural Revolution and the Cold War. The books present some intriguing ideas around the Fermi Paradox (and Drake’s Equation), the serendipity of technology, and the fate of the universe.</p>
<h3 id="harry-potter-and-the-methods-of-rationality-by-eliezer-yudkowsky"><em>Harry Potter and the Methods of Rationality</em> by Eliezer Yudkowsky</h3>
<p>Bear with me, here. Imagine an exploration of a soft-fantasy universe by a cold-minded rationalist that serves as an object lesson in empiricism, cognitive bias, and human nature. Further imagine that this work contains well-written characters that grow over time, have understandable weaknesses, and reason authentically through differences in opinion. Finally, imagine that this 1800-page work is a fan-fiction based on the wildly popular Harry Potter series, and effectively leverages the source material to tell an entirely novel story. This is <a href="http://www.hpmor.com/">Harry Potter and the Methods of Rationality</a>. Although this book requires a fair amount of investment and cultural context to fully appreciate, I consider this book universally recommended reading.</p>
<h3 id="guns-germs-and-steel-by-jared-diamond"><em>Guns, Germs, and Steel</em> by Jared Diamond</h3>
<p>Why was there such an asymmetry between the Spanish conquistadors and the Aztec or Inca when they collided in the 1500’s? This is the question that <em>Guns, Germs, and Steel</em> attempts to answer systematically. It presents a compelling argument that the ultimate cause was primarily geography — the land area and length of contiguous zones with compatible climates in Eurasia drove many of the natural advantages that led to faster development of population, technology, urbanization, infectious diseases, and access to historical information. Although the validity of the argument doesn’t depend on the conclusion, it also happens to be very uplifting, offering a clear rationale for dramatic inter-cultural imbalances of power that does not depend on luck or unfounded assertions about regional differences in intrinsic human characteristics.</p>
<h3 id="the-evolution-of-everything-by-matt-ridley"><em>The Evolution of Everything</em> by Matt Ridley</h3>
<p><em>The Evolution of Everything</em> attempts to apply the principles of evolution by natural selection to draw conclusions in a broad array of disciplines and on a broad set of ideas. As an example, the information present in a given piece of technology can be copied and reproduced with modification. Whether due to market forces or natural causes, only some instances of technology survive to inspire imitation, leading to better-adapted design over time. The oceans themselves drive innovation in boat-making.</p>
<p>Although I think some of the claims of the book may be overreaching (or even logical non sequitors), where else can you find specific predictions for education, technology, economics, eugenics, personality, and the internet all in one place? A book that I value for its provocative and thoughtful speculation.</p>
<h2 id="specific-topics">Specific Topics</h2>
<p>See my list of <a href="https://sam-saarinen.github.io/insights/2023/06/06/Books-on-Doing-Good">Books on Doing Good</a> for recommendations with a practical angle.</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:GoldenRule" role="doc-endnote">
<p>As with just about any statement about the Bible, this claim tends to attract opposition and qualification. For readers interested in a broader historical survey of reciprocation in ethical maxims, there are many indices available upon a quick search for “Golden Rule”. <a href="#fnref:GoldenRule" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Why Student Data Privacy Matters2021-12-10T00:00:00+00:00http://sam-saarinen.github.io/insights/2021/12/10/student-data-privacy<p><em>TL;DR:</em> Student data privacy isn’t (just) about student safety; it’s about educational efficacy.</p>
<h2 id="introduction">Introduction</h2>
<p>In the US, there are a variety of laws protecting student data and data of minors (who are typically full-time students). The most well known is FERPA, which regulates the disclosure of personally identifiable data by schools that receive government funding (public schools, for example). As an education consultant and analytics provider, I generally don’t have to worry about disclosure regulations, as the data is anonymized before I see it, but thinking about data access and privacy controls is still a very important part of what I do as an educator.</p>
<p>In fact, I think laws around student data privacy have made teachers more aware of issues of student safety (for example, protecting students with complicated family situations, and minimizing predatory advertising) at the expense of blinding teachers to some of the strictly pedagogical issues stemming from a lack of student data privacy. In this (brief) essay, I hope to highlight why student data privacy should matter to every teacher, independent of regulatory legislation, and to illustrate some pervasive violations of student data privacy that are undermining our public education system.</p>
<h2 id="pedagogical-implications-of-privacy">Pedagogical Implications of Privacy</h2>
<p>As much as we might think data privacy is about the <strong>physical and fiscal</strong> security of our students, it is even more important to their <strong>social and emotional</strong> security. <em>In short, data privacy gives students the freedom to try something they might be bad at, so that someday they might be good at it.</em></p>
<p>As a society, I think we generally attribute too much of success to the idea that some people are “born for greatness”, “inherently gifted”, “talented”, or “naturals”. I think this way of thinking is attractive because it exonerates us of our lack of effort, but it is also a fairly large misrepresentation of how skill and ability develop. While the difference between the fastest runner and the second-fastest runner in the world might be due to genetics, luck, or circumstance, the difference between either of them and the vast majority of the population is the thousands of hours they have spent running. My wife and I recently received our first child, a son, and we’ve been delighted to watch his development over his first few weeks. While we love him deeply, it’s very apparent that he lacks many of the basic skills he will need in order to survive, as we all did at that age. My own mother is fond of reminding me that even though she needs me to teach her how to use her phone, I needed her to teach me how to use a spoon. It would be preposterous for my wife and I to decide during his first weeks of life that our son would never be a good pianist because he lacks the coordination to move his fingers independently, or that he would never be a great scientist because he is unable to run controlled experiments.</p>
<p>Consider language, or music. No one picks up a guitar for the first time and immediately plays a flawless rendition of Van Halen’s <em>Eruption</em>, unless perhaps for some reason they’ve already learned and thoroughly practiced the piece on a markedly similar instrument, such as a bass guitar. Most of the time, what we identify as “natural talent” and “beginner’s luck” is really a combination of transferred learning, audacity stemming from the Dunning-Kruger effect, deliberate focus on the part of the novice, and an increased tolerance for suboptimal performance. What we rarely recognize is that “beginners”, as measured by apparent experience, have a huge range in starting lines. When we mistake unseen advantages for talent, we tend to invest in developing that ability further, pouring resources, time, and emotional attention into the student in question, perpetuating the placebo of talent when they are compared to peers who haven’t received similar investment.</p>
<p>“Sure,” you say, “but what do sports and music have to do with data privacy?” Simple. If students are judged more harshly for having tried and failed than for never having tried, they will never invest the time and effort it takes to become good at something new. I have never met a student who was bad at math. But I have met lots of students who didn’t like the way math classes made them feel. Part of what makes difficult topics daunting is the emotional and social cost of being judged lacking. I know many students who would rather not turn in an assignment at all than turn in one partially completed or with potentially wrong answers. Musicians and athletes perform in public, but they practice in private. By giving students the ability to fail in private until they become good, we open up opportunities for marginalized or discounted participants to surprise us.</p>
<p>This is far from an academic discussion. To the contrary, the next section identifies (heretically) several staples of contemporary educational practice that are desperately in need of reformation.</p>
<h2 id="where-were-getting-it-wrong">Where We’re Getting it Wrong</h2>
<p><em>GPA</em>. When a student graduates from high school or university, they carry with them a single number that summarizes their performance at the institution. The Grade Point Average is typically computed in the following manner: each course is graded on a percentage basis, those percentages are binned into letter grades at convenient cut-offs (A, B, C, D, F), each of those grades are assigned an integer value between 4 and 0, and then the mean of those integers is taken (sometimes as a weighted average for courses of unequal duration). Students are often evaluated as candidates for jobs or future education partially on the basis of this number, with the natural consequence that many students don’t want to risk taking a class in which they might receive a low grade (regardless of how much they might learn). This creates perverse incentives all around. Students seek easier classes and avoid risking classes outside of their main interests, educational institutions attempt to attract driven students by offering classes that are effectively watered-down versions of the main class (Physics for Pre-meds, Calculus for Non-majors, etc.). This produces a vicious cycle of pushing the lower bound of educational attainment, driving grade inflation, and siloing disciplines.</p>
<p>There is another way. Brown University, for example, allows students to drop classes from their transcript (and GPA) any time up until the final weeks of classes, allows students to retake classes to replace their previous grade at any time, and allows students to take any class on a pass-fail basis. This isn’t the only solution, but I highlight it because it does have the intended consequence of encouraging students to focus more on cultivating their intellectual curiosity and breadth of knowledge than on maximizing an artificial metric of knowledge. One might think that these are differences in grading, but they are ultimately about student data privacy. Students are free, without social consequence, to have poor performance on the way to attaining good performance.</p>
<p><em>Transcripts</em>. This is similar to the issue with GPA, but has more to do with filtering for interests than for grades. If a student has “extra classes” in a field of interest, it can make them appear less focused than a peer who only took classes within their major (high school students have much less freedom to pick classes, so it is less of an issue for them). Students can be afraid that accessory classes in dance, martial arts, or the proverbial “underwater basket-weaving” will dilute the career-relevant classes they have taken. The pressure to present an application with a unified narrative can inhibit students from exploring unexpected connections between disciplines (like computer science, education, and game design, for example). If I play 6 instruments, nothing stops me from only listing piano as a hobby on my resume. But if I have to attach a transcript for my application to grad. school, I am unable to control which classes are seen by the admissions committee.</p>
<p><em>Deadlines and Final Exams</em>. I’ll admit, coordination between people often requires agreeing on times and places. But many of the deadlines in education are artificial: end-of-term; presentation dates; final exams; and assignment due dates, for example. The real issue here isn’t the deadlines themselves; again, many of them are necessary for coordination. The issue is that students don’t have control over when their progress is assessed. Suppose a student has a bad day on the date of a critical presentation or assessment. Hopefully their preparation is enough to carry them through, but if not, the evaluation they received will become a permanent (if small) part of their educational record. In real life, deadlines show up cyclically (publication submission deadlines, annual conference cycles, seasonal rises in business) or by mutual consent (client expectations, contractual agreements, initiation of a physical or biological process), and there are always future opportunities. Because students don’t have control over when they are observed, measured, and evaluated (or when their performance is made public), short-term thinking is rewarded, and long-term thinking is discounted. Students have much higher relative rewards for cheating, cramming for tests, and optimizing their grades than they would if there were multiple opportunities to be assessed. This is a privacy issue in the same sense that letting musicians practice in private is a privacy issue - letting someone decide when they will make their performance public enables students to persevere in developing new skills, even if they can’t commit the same amount of time per day as their peers.</p>
<h2 id="how-we-can-do-better">How We Can Do Better</h2>
<p>If you’ve made it this far in my essay, I hope you’ll hear me out on two calls to action - one cognitive and one practical.</p>
<p><em>Cognitive call to action:</em> Start thinking about student privacy as the freedom to be bad before becoming good. When students have the freedom to decide what is made public to whom and when, they will be much more willing to take intellectual risks, persevere through difficulty, and pursue learning over credentialing.</p>
<p><em>Practical call to action:</em> Design systems that allow students to opt-in to sharing their data, and give them the ability to select what subset of their data they make available. This includes sharing with teachers, other students, and potential employers. If you are a teacher or administrator, give students the ability to retake assessments (say, for the same class the following year), to select which classes and grades are shared with others, and give them the emotional and social security to fail on the path to success.</p>
<p>If you’re interested in designing education systems that are both practical and effective, feel free to reach out.</p>
<p>Thanks,<br />
- (S)am</p>
[List] Education Datasets2021-10-12T00:00:00+00:00http://sam-saarinen.github.io/insights/2021/10/12/Education-Datasets<p>Education research can be challenging due to the expense, time, and difficulty in collecting granular student data. Existing datasets can be used to test modeling techniques and assumptions, but relatively few datasets with student-level interaction or response records have been made available due partly to (somewhat justified) concerns about the difficulty of maintaining student privacy and compliance with FERPA and NSF IRB protocols. That said, there are lots of kinds of data that are very useful to education research while also being completely anonymized.</p>
<p>There are a variety of official sources for aggregate data related to demographics and standardized test scores at the school level - these are useful for evaluating the effects of large-scale policy decisions, but are of limited usefulness in the design of new models for assessment, knowledge acquisition, or memory. This page will serve as a (periodically updated) list of publicly available datasets useful for conducting research of the latter kind.</p>
<h2 id="the-fracsub-dataset">The FracSub dataset</h2>
<p>The Fraction Subtraction Dataset consists of correctness ratings on 20 assessment items by 536 different students. The assessment items all involved computing the difference of two fractions.</p>
<ul>
<li>Original Data: http://staff.ustc.edu.cn/~qiliuql/data/math2015.rar</li>
</ul>
<p>[More coming soon…]</p>
How Your Business can Benefit from AI2019-05-11T00:00:00+00:00http://sam-saarinen.github.io/insights/2019/05/11/how-your-business-can-benefit-from-ai<p><img src="/assets/chang-duong-1170439-unsplash.jpg" alt="Picture: Doctor at a Computer" />
<em>What can AI do for your business?</em></p>
<p>Artificial Intelligence (AI) has captured the public imagination, and it seems that every week there’s a new stance on issues ranging from the regulation of autonomous vehicles to the privacy of users in an era of big data. But nearly all of these discussions are focused on the implications of new technologies and the consequences for big businesses. Almost no one is talking about the practical implications, here and now, for everyday companies and businesses and for their customers. That’s what this article is for. By the end, you will understand the cost associated with using AI technologies (it’s cheap), you will see three opportunities for innovation in your own business, and you’ll be able to form realistic predictions of what specific AI techniques might do for your business.</p>
<p>To start things off, let’s make it clear what we mean when we say “AI”.</p>
<h2 id="what-is-ai">What is AI?</h2>
<p>When AI was first conceived, it was in response to the question “Can Machines Think?” AI was purposefully anthropomorphized in order to consider how we might answer that question. When researchers of the nascent Computer Science convened in 1956 at Dartmouth, their clear ambition was to create Artificial Intelligence that worked creatively, generalized, and was able to use natural language (human language, in contrast to machine/programming languages). They failed miserably.</p>
<p>In fact, despite misplaced hype around chatbots, <a href="https://en.wikipedia.org/wiki/Sophia_(robot)#Controversy_over_hype_in_the_scientific_community">Sophia the Robot</a>, and text-synthesis tools, a deep understanding of what makes us uniquely human remains as elusive as ever. This “Strong AI” or “Artificial General Intelligence” is too far away to be anything more than a fantasy for most businesses.</p>
<p>In place of the robot geniuses that were sought, researchers (including myself) have mostly innovated in statistical analysis (loosely, “machine learning”). Most of the impressive accomplishments of AI (“weak/real AI”) in the past decade, including speech-to-text translation, boardgame-playing, and image classification (e.g. written character recognition, face recognition) have been built on these statistical techniques that were refined over the course of decades. These statistical techniques are the kind of AI this article will talk about.</p>
<p>Recent progress in these techniques has largely been driven by increasingly sophisticated mathematics, larger (and more) computers, and an all-time high of public and private funding for AI research. Fortunately, you won’t need any of these things to use implementations of recent AI techniques for your own business.</p>
<h2 id="what-can-ai-be-used-for">What can AI be used for?</h2>
<p>The main uses of AI for businesses are in <strong>SUMMARIZING</strong>, <strong>PREDICTING</strong>, and <strong>ACTING ON</strong> data. Producing value for your business is as simple as choosing what data to apply these techniques to, and integrating the results into your business process. There are many creative ways to leverage these techniques to derive value, but let’s talk about some obvious ones first.</p>
<p>All companies have customers/users/clients, and understanding those clients is critical to producing value. AI techniques from a subfield called <em>Unsupervised Learning</em> can allow you to summarize the data you have about your customers. Are they mostly the same age, or do they represent different age groups? What geographic areas are they from? Do they purchase your products for the same reasons? While you could apply traditional statistics to any one aspect of your customer data (plotting a distribution/histogram of ages, for example), AI can help you uncover the relationships between different dimensions. This can lead to new marketing strategies, horizontal expansion of services, or adaptive customer interactions.</p>
<p>Predictions (or inferences) can be generated using <em>Supervised Learning</em> techniques, and can be used to streamline triaging and response processes. For example, customer emails could easily be categorized into “general inquiries”, “scheduling questions”, “support questions”, or “sales inquiries”, and then forwarded to the appropriate person. (Without diving into the details, it should be apparent this can be cast as a straightforward statistical problem based on keywords like “question”, “appointment”, or “price”.) One of my first consulting jobs was to create such an email auto-classifier.</p>
<p>Predicting the future (forecasting) is also critical to business decision making, and one of the most helpful things to predict is whether a customer is likely to buy a particular service, and when. This allows you to adapt - promoting the service or product that a customer is most likely interested in. If you don’t know for certain what a customer might be interested in, AI can also be used to efficiently collect more data by learning from the responses to its previous recommendations. These kinds of problems can be solved efficiently using <em>Reinforcement Learning</em> techniques.</p>
<p>Let’s look at how we might apply one of these approaches to a specific business.</p>
<h2 id="an-example">An Example</h2>
<p>Suppose you run a small medical office - a private practice with more than 100 patients (enough clients that using statistics is a good idea). You can use AI to understand your patients and to discover ways to expand your business. First, use a clustering algorithm (unsupervised learning) to summarize your patient data by creating a short list of groups of similar patients. Second, examine each group’s properties to see what problems most of your clientele are faced with. The code to do this is quite simple:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Suppose 'data' has been loaded in previously
</span><span class="kn">from</span> <span class="nn">sklearn.mixture</span> <span class="kn">import</span> <span class="n">GaussianMixture</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">GaussianMixture</span><span class="p">(</span><span class="n">n_components</span><span class="o">=</span><span class="n">n_clusters</span><span class="p">)</span>
<span class="n">model</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">model</span><span class="p">.</span><span class="n">means_</span><span class="p">)</span>
</code></pre></div></div>
<p>Finally, you can take business actions (advertising, expanding services) to increase the reach of your business. For example, suppose that you find that most of your clients are young children or their parents coming in for a regular checkup, but there’s another group consisting mostly of teenagers with minor sports injuries. You might choose to advertise your health checkups with other local services catering to children (e.g. at children’s movies, local children’s clubs, or with the local library), and to offer sports injury first aid training around the time each sports’ season starts. Knowing who your customers are and where they come from is critical to acquiring new business.</p>
<h2 id="how-to-evaluate-ai">How to Evaluate AI</h2>
<p>This section is about comparing the value produced by AI to the immediate cost. Let’s start with understanding the value produced by AI.</p>
<p>Value produced by AI generally falls into one of two categories: new value created by novel insights or possibilities; or value from automating or improving on existing value-producing processes. The medical office example above benefitted from the first kind of value. AI was able to produce unique insight into a large pool of customers and thus inform refined advertising opportunities. This value can be estimated only loosely, and is derived from new business. An email auto-classifier produces value of the second type. This type of value can be easily estimated on the basis of changed revenues or expenses.</p>
<p>Now what determines the cost of using AI? If you have a dedicated developer, they can easily learn about existing AI techniques from an <a href="https://www.packtpub.com/big-data-and-business-intelligence/hands-artificial-intelligence-small-businesses-video">online tutorial</a>. The cost of this training (and a raise for the developer) is pretty much always worth it, and will allow you to capture some of the low-hanging opportunities afforded by AI. However, your circumstances might require additional expertise, in which case you might contract with a consultant or hire someone with a graduate degree specializing in a technology you’re interested in. There are a few reasons you might do this:</p>
<ol>
<li>The innovation should be applied at a very large scale, or in a high-risk setting.</li>
<li>You want the benefit of additional experience when identifying business opportunities that AI could create.</li>
<li>Your task requires novel techniques or above-average performance.</li>
</ol>
<p>In any of these settings, it may be worth hiring an expert. But how much should you pay them? Experts typically have high hourly rates, due to their advanced training, specialized experience, and relative scarcity. However, many experts are willing to contract for a nominal flat rate, plus a fraction of the value they produce (akin to royalties). Regardless of the option you choose, your estimate of created value should determine what level of expertise you are willing to pay for. For a mid-to-large company with a substantial online retail market, even a 1% increase in sales is likely well-worth the cost of an AI expert.</p>
<p>AI produces the most value in situations where there is a lot of data available. That said, most of that value is untapped unless paired with human creativity and concern for clients. AI produces the most value when diligent businesses use AI as a tool to better serve and relate to their customers.</p>
<p><img src="/assets/arvin-chingcuangco-1337417-unsplash.jpg" alt="Picture: Doctor and Patient" />
<em>AI produces value by improving your ability to relate to and serve your clients.</em></p>
<p>If you’re interested in learning more about AI and how it can be used to benefit your business, check out my video course <a href="https://www.packtpub.com/big-data-and-business-intelligence/hands-artificial-intelligence-small-businesses-video">“Hands-On Artificial Intelligence for Small Businesses”</a>, because in less than a weekend, you can develop the skills to use AI libraries (in Python) and apply them to data from your own business.</p>
[List] Tools I Use2019-01-12T00:00:00+00:00http://sam-saarinen.github.io/insights/2019/01/12/Tools-I-Use<p><strong>Updated 2021-10-05</strong></p>
<p>In the course of my work, I find myself often testing, adopting, and switching tools for tasks I do frequently. Although one can search for opinions about the best tool for a given task, sometimes it’s not even clear what those task divisions should be. For my own organization (and hopefully the benefit of readers with similar problems), I’ve accumulated the tools that I use in my own workflow below. I expect to change create follow-up posts as new tools come along, as my needs change, or as I have time to add more notes (especially about documentation and/or books).</p>
<h2 id="cloud-storage">Cloud Storage</h2>
<p>For backing up data, synchronizing across devices, and sharing with others/collaborating.</p>
<h3 id="options-ive-tried">Options I’ve tried</h3>
<ul>
<li>Google Drive</li>
<li>Dropbox</li>
<li>One Drive (Microsoft)</li>
</ul>
<p>Of these, I like Google Drive the best, mostly because of its integration with Gmail, Google Docs/Spreadsheets/Presentations/Forms, and for its transparent permissions management.</p>
<h3 id="unorthodox-options">Unorthodox Options</h3>
<ul>
<li>GitHub</li>
</ul>
<h2 id="email">Email</h2>
<p>Email is a vital for communication in today’s technology infrastructure. I use Gmail, largely because of the institutional licenses at my graduate and undergraduate universities.</p>
<h2 id="office-software">Office Software</h2>
<p>Here, I define office software as document editing, spreadsheet managment, and slideshow editing. I use the Google suite, primarily because of its collaboration features, extensibility, and ubiquity. Google Forms is a nice bonus.</p>
<h2 id="general-purpose-programming-languages">General-Purpose Programming Language(s)</h2>
<p>Although I’ve become more of a polyglot with more education (yet also found more ways to use new paradigms in old languages), it’s convenient to have a small set of languages that cover a large swath of use cases.</p>
<p>For most purposes (especially computational experiments) I use Python 3 in the Jupyter Notebook environment (packaged with Anaconda). (In actuality, I typically use Google colab notebooks, but running on a local kernel. I like the convenience of versioning and sharing through Google Drive, and the possibility of running in the cloud when necessary.) With Numpy, Matplotlib, and SciPy, I am able to quickly write experiments that also run sufficiently quickly. At other times in my life, my primary language has been Java, C++, or Mathematica. I have a soft spot for the richness of the debugging experience in Java using Eclipse, but I find my Python to be slightly less verbose, and a large number of open-source libraries are now built on the Python stack.</p>
<p>For web app development, I use Typescript with React (library/framework) and Ionic (UI Library). I deploy to Google Firebase or GitHub Pages, depending on the backend requirements of the app. I’ve also tried Angular (also Typescript), Rails (Ruby), and Django (Python), and I personally prefer the functional style and clean organization of React.</p>
<h2 id="deep-learning-library">Deep Learning Library</h2>
<p>I considered TensorFlow and PyTorch and chose to use PyTorch due to my perception that it was more succinct for the tasks I cared about, and for its more flexible introspection capabilities.</p>
<h2 id="todo-management-and-project-planning">ToDo management and project planning</h2>
<p>I’ve seen a few services that do this; I used Asana off and on for about two years. The key features for me are the ability to create nested lists (this is easy in Asana up to about 3 levels of depth, after which it becomes a pain), the ability to use comments to create “quest journal” updates, and the ability to organize tasks visually on boards. I recently discovered <a href="https://notion.so">Notion</a>, which has slightly nicer nesting and organizational capabilities. Notion also makes it easy to integrate group notes, documents, and data into your team workflow.
I thought I would use the scheduling and reminder capabilities more, but I find it’s more robust to just do that through my calendar.</p>
<h2 id="note-taking">Note-Taking</h2>
<p>I started using <a href="roamresearch.com">Roam</a>, which is nice because of its cross-linking, daily notes, and fast searching. I’m not sold on its use for general productivity managements (such as task management), but it’s a great tool for writing and organizing thoughts. It also supports a wide variety of media types.</p>
<h2 id="repository-management">Repository Management</h2>
<p>I’ve used GitHub and GitLab. I tend to lean toward GitHub, just because all of the scientific repositories I care about seem to be hosted there.</p>
<h2 id="chrome-extensions">Chrome Extensions</h2>
<p>I use Brave (built on Chromium), and have found a number of tools that make web use better. One is called Tabbie, which allows you to save and reload collections of chrome tabs for later. This has reduced (although not eliminated) my tendancy to leave windows open with dozens of tabs for reference for each project. For one-time tabs you want to queue up, I’ve also found OneTab (great for groups of tabs) and Reading List (great for one-off articles) to be helpful. Another tool that I’ve found suprisingly useful is simply called “Video Speed Controller”, which allows playback of any html5 videos (YouTube, Netflix, and almost everything else, at the moment) at arbitrary speed. It also has keyboard shortcuts for speeding up and slowing down the video, which make it easier to scan to the most important parts in e.g. a lecture or research talk. Finally, I broke down and installed Grammarly because it catches a broader variety of grammatical errors than the browser’s built-in spellcheck.</p>
<h2 id="papercitation-management">Paper/Citation Management</h2>
<p>I’ve used Zotero and Mendeley, and have a slight preference for Mendeley due to its group/shared folders and note-taking ability. Either tool is far better for me than using a physical filing system.</p>
<h2 id="feed-management">Feed Management</h2>
<p>For the time being, I’m using Feedly to track RSS/Atom feeds. There may be better aggregation/digest tools, but I haven’t spent a lot of time looking.</p>
<h2 id="presentation-recording">Presentation Recording</h2>
<p>I recently learned that Microsoft PowerPoint allows you to record narration/timings and that you can save/export as an MP4 video. Although for general video processing needs I like Premiere (Adobe Creative Cloud) or OpenShot (a pretty good open source alternative), there is really no comparison when it comes to recording presentations - PowerPoint has easy-to-use graphics and animation capabilities and individual slides can be re-recorded without interrupting the total presentation. I plan to use this for all of my informational videos from now on that don’t rely heavily on external footage. (If anyone is looking for a more general screencasting software package, OBS Studio has a free download that works quite well.)</p>
<p>I hope this was helpful!</p>
<p>Thanks,<br />
- (S)am</p>
[Paper] How to personalize education at scale.2018-10-06T00:00:00+00:00http://sam-saarinen.github.io/insights/2018/10/06/How-to-Personalize-Education-at-Scale<p>Education (conversely, learning) is one of the quintessential human experiences. It is also one of the most practical of human endeavors, leading directly to higher income and quality of life, greater social mobility (one of the strongest indicators of societal fairness), and greater tolerance for other people and beliefs.</p>
<p>Differences in education quality are also one of the greatest sources of inequity in the contemporary US, and across the globe. All else being equal, families (or communities) with more money can invest more in education, which does produce results. The issue is not that these families can afford high-quality education; the issue is that there are many families that cannot.</p>
<p>To illustrate the magnitude of this effect, Bloom (of “Bloom’s Taxonomy”) wrote in 1984 about the “Two-Sigma Problem” - that students who receive one-on-one tutoring perform more than two standard deviations above the average of students who receive classroom (1 teacher to ~30 students) instruction. This means that in randomized trials, the average student who receives one-on-one instruction performs better than 98% of students in a regular classroom. Clearly, personalized instruction is effective.</p>
<h2 id="the-problem---cost">The Problem - Cost</h2>
<p>The problem is that under traditional educational paradigms, personalization of education is expensive. It requires a tutor or instructor for every student, which just isn’t feasible in most economies. Engaging parents, leveraging near-peer tutoring (students who passed the class previously), and connecting students with mentors in the community are all great ways to move toward personalized learning for every student. But unfortunately, these do not address the fundamental cost of one-on-one tutoring; they only spread the cost out over more people. Furthermore, these individuals don’t necessarily have the pedagogical training and domain expertise that a dedicated tutor or professional teacher will have acquired through education degrees covering scientifically validated instructional approaches or through years of experience with scores of students.</p>
<p>In an attempt to address this problem in a cost-effective way, computerized tutors have become a popular area of research, with some notable successes. Programs like the <a href="http://pact.cs.cmu.edu/pubs/koedingercorbett06.pdf">Carnegie Math Cognitive Tutor</a>, <a href="https://www.sri.com/sites/default/files/publications/2014-03-07_implementation_briefing.pdf">Khan Academy</a>, or <a href="http://static.duolingo.com/s3/DuolingoReport_Final.pdf">DuoLingo</a> have achieved wide adoption with measured results, in some cases approaching the efficacy of personalized instruction. All of these successes have come from domains that are problem-rich; mathematics and foreign language instruction lend themselves easily to automatically graded questions that give a good idea of what students know. These programs all track what students know and are likely to get right, ensuring that the instruction provided is always appropriate to what the student has mastered. However, they do not accommodate differences in student background, interests, interpretation, or motivation. The question is, how can we extend these amazing results to other subjects like history, writing, or art?</p>
<h2 id="a-solution---machine-learning">A Solution - Machine Learning?</h2>
<p>One of the most exciting technologies for adapting and personalizing processes at scale is machine learning. The data-driven processes that allow Facebook to recognize and tag faces, Google to guess what you’re looking for, and Netflix to recommend a movie you might like can be used to recognize different types of learners, suggest curricula, and recommend resources that can help students understand new topics. The technology is ripe for adaptation to education, if only we can solve a few small problems:</p>
<p>First, we need a way of measuring when we have succeeded. We need a measure of what students have learned, and it needs to be something that’s specific enough to track the benefits to individual students, short enough to be used whenever needed, and cheap enough to create that we can make one for every topic we might want to teach. One-on-one interviews are reliable, but expensive to administer, but automatically graded exams can be either too coarse or too difficult to design. To address this problem, I’ve been working on methods for generating quizzes using crowd-sourcing and machine learning, and some collaborators and I recently had a paper accepted on this topic: “Harnessing the Wisdom of the Classes: Classsourcing and Machine Learning for Assessment Instrument Generation”. In this paper we use a <a href="https://en.wikipedia.org/wiki/Multi-armed_bandit">Multi-Armed Bandit Process</a> to select questions from crowd-sourced contributions that are the most informative in distinguishing levels of student knowledge. (More on this in a future blog post!)</p>
<p>Second, we need a way of modeling students so we can predict how they will respond to different instruction. Part of that model is what they know, and the computerized tutors have shown that mastery tracking is enormously helpful, but students are more than just buckets of knowledge. The question is, what other features are helpful for predicting how students learn? Some collaborators at a company in China called <em>Special A Education</em> have suggested that personality assessments, such as the MBTI or Hexaco may be helpful for augmenting our model of students. They have also suggested that teachers may be able to identify character traits and students may be able to identify interests. As data is collected with more students, we can rigorously evaluate which of these additional features (or others) are most useful for determining how individual students learn best. Although it can be tempting to <a href="https://files.eric.ed.gov/fulltext/EJ767768.pdf">simply hand pick features that we think might be helpful</a>, machine learning can help us to systematically identify features that are <a href="https://www.researchgate.net/profile/Cedar_Riener/publication/249039450_The_Myth_of_Learning_Styles/links/0046353c694205e957000000.pdf">genuinely and statistically reliable</a>. An exciting new direction is using open-ended <a href="https://maateachingtidbits.blogspot.com/2018/04/the-exercise-with-no-wrong-answer.html">“Notice and Wonder” activities</a> to generate topic-specific features that might be useful for modeling students.</p>
<p>Third, we need a way of systematically and efficiently determining what the best way to teach each type of student is. This is why I am in grad school right now, working in Reinforcement Learning. The idea behind reinforcement learning to create algorithms that can improve over time based on signals of how well they have succeeded. The canonical reinforcement learning problem is the Markov Decision Process (MDP). MDPs have 4 parts (although it can change depending on how pedantic the RL researcher is feeling): (S, A, R, T). S stands for a state space. In other words, the set of possible states of the universe. In the education setting, this includes how much the student knows, what their interests are, and the other features we determine by solving the previous problem. A stands for action space. This is the set of actions the system can take to impact the world. In education, this might be showing an educational video, having the student read an article, play a game, create something, or complete an activity (online or with a teacher or peer in person). R stands for reward, and is the way that we measure success. In our setting this might be how much they know when we test them, or how quickly they master a topic. T represents the transitions in state. If I teach student B using action C, how will the state of the student change?</p>
<p>Clearly, education (and in fact, pedagogical experimentation) can be cast as a reinforcement learning problem. The problem is, it’s a <em>really big</em> reinforcement learning problem. There are so many different types of students, so many different ways of teaching them, and so many different things to teach that there’s no way we can just try every combination and see what works. We have to <a href="https://infoscience.epfl.ch/record/177246/files/srinivas_ieeeit2012.pdf">generalize across different contexts</a>, deal with <a href="https://core.ac.uk/download/pdf/82606478.pdf">imperfect knowledge of the student</a>, and hopefully notify teachers when it would be really helpful to have a <a href="http://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/download/15031/14411">new way of teaching certain students</a>. But I’ve also just linked to a number of papers from people who have invented techniques that might be able to overcome these difficulties. We have an unprecedented access to data and ability to disseminate knowledge. The time is right for us to use this cutting-edge technology to address one of the most exciting possibilities of our time: giving <em>every</em> student the best education they can get.</p>
<h2 id="a-note-about-human-relationships">A Note about Human Relationships</h2>
<p>The goal of this article is not to push for computerized education because it is cheap. The goal of this article is to inspire us to work together on scalable personalized education, because it is effective. Electronic supplements to in-person education can free teachers working with groups of students to focus on important personal skills, to develop students’ communication and collaboration, and to inspire and display the qualities that make us uniquely human - our curiosity, empathy, and courage. Computerized tools can empower classrooms by removing or streamlining the mundane parts of learning, like assessment, memorization, and presentation of fact. Computerized tools can empower classrooms by providing data-driven support for novel pedagogical practices or learning activities. Computerized tools can empower classrooms by equipping interested parties like teachers, parents, and administrators with interpretable measures of what individual students have learned.</p>
<p>In the spirit of empowering human relationships, let’s work together for an amazing future!</p>
<hr />
<h5 id="this-article-is-based-on-the-paper-personalized-education-at-scale-which-i-recently-wrote-with-evan-cater-and-michael-littman"><em>This article is based on the paper <a href="https://arxiv.org/pdf/1809.10025.pdf">“Personalized Education at Scale”</a>, which I recently wrote with Evan Cater and Michael Littman.</em></h5>
<p>To cite, feel free to use:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{saarinen2018personalized,
title={Personalized Education at Scale},
author={Saarinen, Sam and Cater, Evan and Littman, Michael},
journal={arXiv preprint arXiv:1809.10025},
year={2018}
}
</code></pre></div></div>
<p>Thanks,<br />
- (S)am</p>
What Crowdfunding Is and Isn't2018-07-10T00:00:00+00:00http://sam-saarinen.github.io/insights/2018/07/10/what-crowdfunding-is-and-is-not<p><img src="https://sam-saarinen.github.io/assets/CrowdFunding.png" alt="Which Crowdfunding Platform and Why?" width="50%" /></p>
<p>A few years ago, I started an educational company with a friend I met while we were both volunteering at a high school for differentiated learners. We taught technology and design, ran community showcases, and studied and tested pedagogy intensely, if not rigorously. I went to grad school so that I could solve some difficult research problems related to improving education, but in the mean time, inventors renaissance, LLC has in turn started <a href="https://patreon.com/irGAMES">ir GAMES</a>. ir GAMES is an avenue for us to test out new interactive pedagogies and build a following, and it seems like everyone we talk to is really excited about what we’ve made. One of the most common questions we get is, “Do you have a Kickstarter or something?” Initially, we always responded with, “We’re really trying to fund everything ourselves,” but we’ve since realized that we were thinking about crowdfunding the wrong way. So without further ado, here’s a handy guide to how to think about crowdfunding:</p>
<h2 id="crowdfunding-is-not-investment">Crowdfunding <strong>is NOT</strong> “investment”</h2>
<p>Let me explain what I mean. First of all, by “crowdfunding” I really just mean KickStarter, IndieGoGo, or Patreon, which at the time of writing are the three most popular (by number of users) crowdfunding platforms in the US. Second, by “investment” I mean selling shares in the company. Maybe others don’t have this misconception, but we definitely felt like we would be losing something by using a crowdfunding platform. But in reality there is a small flat percentage fee for using the platform and processing transactions, and using any of the aforementioned sites doesn’t mean giving away any ownership in the company. From a backer’s perspective, it feels like investing, because money is paid up front for some eventual reward, but that eventual return is usually a product or service.</p>
<h2 id="crowdfunding-can-be-for-preorders">Crowdfunding <strong>CAN BE</strong> for preorders</h2>
<p>A much better way to think of crowdfunding platforms is as a way of collecting pre-orders. Small businesses (and even larger businesses) can fall into a kind of Catch-22 where:</p>
<ol>
<li>They don’t have enough capital yet to manufacture with economies of scale.</li>
<li>Getting capital requires purchases by individuals.</li>
<li>Individuals are much less willing to buy the product at the higher price point.</li>
</ol>
<p>While venture capital can help to break through to larger scales of production, pre-orders can serve a similar function, providing the capital up front to manufacture a product with the benefits of scale. The difficult part is that unlike investment, where only a few people have to be convinced to believe in your business and/or product, pre-orders require orders of magnitude more people to be willing to take a chance on you, albeit with a smaller amount of cash.</p>
<h2 id="crowdfunding-is-not-free-marketing">Crowdfunding <strong>is NOT</strong> free marketing</h2>
<p>Another misconception that we’ve had is that if an idea is good, it will automatically be discovered on a crowdfunding platform and raise enough capital to survive. The simple fact is that there are way more <em>ideas</em> on crowdfunding platforms than funded ones. Although organic discovery is possible, most platforms will only drive random visitors to your page if you are already generating a lot of traffic.</p>
<h2 id="crowdfunding-can-be-a-rallying-point-for-marketing-efforts">Crowdfunding <strong>CAN BE</strong> a rallying point for marketing efforts</h2>
<p>A crowdfunding page can be a good central location for people to follow your project. Many have built-in integrations with social media, allow posting new content on the site, and make it easy to monetize or differentiate content. If you can keep people coming back to your crowdfunding page, you don’t have to dilute your efforts by managing content on multiple different platforms.</p>
<h2 id="what-did-we-choose">What did we choose?</h2>
<p>ir GAMES has just launched a page on <a href="https://patreon.com/irGAMES">Patreon</a>. This was a natural fit for us, since we’re developing many products, and we want to build a following of people who share our vision across all of them. We are well positioned to release ongoing benefits to subscribers, including newly released games, art, development/art/design blogs, and more. But we also plan on using other platforms for preorders of specific games. Many retailers and distributors want to see proven sales, and we see those other platforms as an intermediate step to a physical retail presence.</p>
<p>I hope this was helpful, and please, go check out <a href="https://patreon.com/irGAMES">our page on Patreon</a>!</p>
<p>Thanks,<br />
- (S)am</p>
Making Machine Learning Review Easier with Music Videos2018-05-08T00:00:00+00:00http://sam-saarinen.github.io/insights/2018/05/08/machine-learning-musical<p><a href="https://www.youtube.com/watch?v=DQWI1kvmwRg&list=PLVLBHV224RtDYfNuYh-9Ju5WHP18oKuiL"><img src="https://i.ytimg.com/vi/g15bqtyidZs/hqdefault.jpg?sqp=-oaymwEXCPYBEIoBSFryq4qpAwkIARUAAIhCGAE=&rs=AOn4CLBFpiUF3yhfI0bEY0RlEz3mVe9MPg" alt="Link to ML Music Video Playlist" /></a></p>
<p>I’ve had the privilege over the last semester of helping Michael (Littman, my advisor) to teach machine learning to a class of around 240 students. As the final drew nearer, we began to discuss the possibility of an extra credit assignment to help students review for the final and produce artifacts to help their peers. We had the crazy idea of having the students make music videos (like Schoolhouse Rock, for those who know what that is) explaining different topics from the course. What was even crazier is that more than a third of the students in the class took us up on it. The result is a youtube playlist (linked above) for the class to review by, consisting of one video from several years ago by Michael in collaboration with Charles Isbell and Udacity, and about 16 videos produced by teams in the class. I was really impressed by the quality of work produced by the students, and feel motivated to consider other forms of student-contributed content.</p>
<p>For anyone interested in the specifics of the assignment, here’s the original description:</p>
<blockquote>
<p>Overview:<br />
Your task is to create a parody music video explaining a machine learning topic in a way that makes it easier for your classmates to review. You will be evaluated on technical correctness (is everything in the lyrics true), clarity of presentation (do the lyrics or visuals help someone understand the topic), and production value (is the video enjoyable to watch/listen to). In order to ensure that the content is technically accurate, lyrics should be submitted to (S)am at least one week before the deadline. He can also help clean up any difficult or messy sections, and is available throughout this assignment to help explain concepts, brainstorm lyrics, or connect you with production equipment or tools.</p>
<p>If your team produces a quality video, it is worth up to 5% on your final grade, about ⅔ of a homework assignment.</p>
<p>Tips:
It is easier to parody existing songs than to write your own, although you are certainly welcome to write your own music if you prefer. A low-time-investment workflow might look like this:</p>
<ol>
<li>Choose a popular song and find a karaoke/instrumental version (.5 hours).</li>
<li>Choose the topic and brainstorm the content of each verse (1 hour).</li>
<li>Have some people work on ironing out the lyrics (2 hours) while others work on visuals for the video, such as pictures, equations, animations, or costumes/dance moves (2 hours).</li>
<li>[Send the lyrics to (S)am for review.]</li>
<li>Pick a location and obtain recording equipment (1 hour).</li>
<li>Record any video (1 hour) from at least two different angles [2 takes]. Do something to make it easy to synchronize the video (the music playing in the background, for example), and then you can switch easily back and forth to make a nice-looking music video. Alternately/additionally, record non-video voices while listening to the music through headphones (.5 hours) [which makes it easier to manage the balance of the vocals in editing].</li>
<li>Edit the video together using audio and video software (1 hour). Audacity and OpenShot work well, but Brown also has student licenses for some much nicer software, such as Adobe Premiere via the Creative Cloud.</li>
<li>[Upload and send the YouTube link to (S)am.]</li>
</ol>
<p>The total time for any one person under this scheme is about 6.5 hours, and good task division can reduce that. Using group messaging tools like GroupMe, Slack, or WhatsApp, and scheduling tools like Google Calendar, Doodle, or When2Meet can greatly facilitate team management. Convogo (Brown Startup: getconvogo.com) can be used to facilitate meetings, and Asana (asana.com) can be used for group task management and filesharing, although that might be overkill for this project.</p>
<p>Final Thought:
We want you to succeed. Not only is this extra credit assignment meant to help you show your understanding of course topics and improve your grade, but this is meant to help your classmates as well (and maybe make you famous). If there is anything that is getting you stuck, or seems to require more time than outlined above, please get in touch with (S)am. He can connect you with resources, help with lyrics, explain or suggest topics or ideas, or even do featured performances or give editing tips. Although there is work involved, we don’t want this to be an undue burden on anyone, and we certainly don’t want any work invested to go to waste because a quality result wasn’t produced by the deadline.</p>
</blockquote>
<p>Thanks,<br />
- (S)am</p>
Creating a Personal Website - Using GitHub Pages and Jekyll2018-05-05T00:00:00+00:00http://sam-saarinen.github.io/insights/2018/05/05/creating-a-personal-website<p>After 2 years at Brown, I could no longer avoid hacking together a semi-permanent site to archive and host work that I want to be available to others. There were a few options that I considered:</p>
<ol>
<li>Building a server in Rails for me to post in. (Overkill)</li>
<li>Hosting raw HTML pages myself or through Brown (brittle, and temporary)</li>
<li>Using a prebuilt hosting service, such as Google Pages or Wix (probably limiting in terms of interactivity and front-end development)</li>
</ol>
<p>Fortunately, I found (at the suggestion of a peer) GitHub Pages, which allows static/frontend site hosting using GitHub (for free). GitHub has support for <a href="http://jekyllrb.com">Jekyll</a>, a static-page generator that eases templating, cross-linking, and RSS feed creation. Jekyll supports content in a variety of markup languages, including markdown and html. One could imagine implementing a backend using a separate service for more complicated apps, but these technologies seem to be more than sufficient for personal site management. I found <a href="http://jmcglone.com/guides/github-pages/">this introduction</a> to be quite helpful.</p>
<p>Thanks,<br />
- (S)am</p>