Friday, August 6, 2021

Sunday, November 3, 2019

Review: Clean Agile, by Robert C. Martin, and More Effective Agile, by Steve McConnell

This started out as a review of McConnell's book, but Just-In-Time, my pre-order of Uncle Bob's book arrived Friday. Ah, sweet serendipity! I read it yesterday, and it fits right in.

I have no idea what the two authors think of each other. I don't know if they're friends, enemies, or frenemies. I don't know if they shake their fists at each other or high-five. But as a software developer, I do believe they're both worth listening to.

I've read most of the books in Martin's Clean Code series. I'm a big fan. He was one of the original signatories of the Agile Manifesto.

A recent post by Phillip Johnston, CEO of Embedded Artistry, set me off on a path reading some of Steve McConnell's books and related material. I've become a big fan of his as well.

Week before last, I read McConnell's Software Estimation: Demystifying the Black Art, 2006. Last week, I read his new book More Effective Agile: A Roadmap for Software Leaders, that just came out in August, the one I'm reviewing here.

This week, I'm reading his Code Complete: A Practical Handbook of Software Construction, 2nd edition, 2004, and Software Requirements, 3rd edition, 2013, by Karl Wiegers and Joy Beatty (or maybe over the next few weeks, since they total some 1500 pages; I note that in the Netflix documentary series "Inside Bill's Brain: Decoding Bill Gates", one of his friends says Gates reads 150 pages an hour; that's a superpower, and I am totally jealous!).

These are areas where software engineering practice has continually run into problems.

The Critical Reading List

Martin's and McConnell's new books are excellent, to the point that I can add them as the other half of this absolutely critical reading list:
In fact, I would be so bold as to say that not reading these once you know about them constitutes professional negligence, whether you are an engineer, a manager, or an executive. If you deal with software development in any way, producer or consumer, you must read these.

Brooks' first edition outlined the problems in software engineering in 1975. Twenty years later, his second edition showed that we were still making the same mistakes.

There are a few items that are extremely dated and quaint. Read those for their historical perspective. But don't for a moment doubt the timely relevance of the rest of the book.

Brooks is the venerated old man of this. Everybody quotes him, particularly Brooks' Law: Adding human resources to a late software project makes it later.

Every 12 years after Brooks' first edition, DeMarco and Lister addressed the theme from a different perspective in their editions of Peopleware.

Forty-four years after, we are still making the same mistakes, just cloaked in the Agile name. So McConnell's new book addresses those issues in modern supposedly Agile organizations, with suggestions about what to do about them.

Meanwhile, Martin's book returns us to the roots of Agile, literally back to the basics to reiterate and re-emphasize them. Because many of them have been lost in what Martin Fowler calls "the Agile Industrial Complex," the industry that has grown out of promoting Agile.

The first three books are easy reading. McConnell's is roughly equivalent to two of them put together. It also forms the root of a study tree of additional resources, outlining a very practical and pragmatic approach.

There are clearly some tensions and disagreements between the authors and the way things have developed. Martin goes so far as to include material with dissenting opinions in his book.

Don't just read these once. Re-read them at least once a year. Each time, different details will feel more relevant as you make progress.

Problems

The problems in the industry that have persisted for decades can be summarized as late projects, over budget, and poor software that doesn't do what it's supposed to do or just plain doesn't work.

Tied up in this are many details. Poor understanding and management of requirements, woefully underestimated work, poor understanding of hidden complexities, poor testing, poor people management.

Much of it is the result of applying the Taylor Scientific Management method to software development. Taylorism may work for a predictable production line of well-defined inputs, steps, and outputs, running at a repeatable rate, but it is a terrible model for software management. Software development is not a production line. There are far too many unknowns.

In general, most problems arise because companies practice the IMH software project management method: Insert Miracle Here. With Agile, they have adopted the IAMH variant: Insert Agile Miracle Here.

But as Brooks writes, there are no silver bullets. Relying on miracles is not an effective project management technique. This is a source of no end of misery for all involved with software.

As Sandro Mancuso, author of the Clean Code series book The Software Craftsman: Professionalism, Pragmatism, Pride (Yes! Read it!) writes in chapter 7 of Clean Agile, "Craftsmanship", "the original Agile ideas got distorted and simplified, arriving at companies as the promise of a process to deliver software faster." I.e. miracles.

A Pet Peeve (Insert Rant Here)

One of the areas of disagreement between various authors is the open-plan office. The original Agile concept was co-locating team members so that they could communicate immediately, directly, and informally, at higher bandwidth than through emails or heavy formal documents. It was meant to foster collaboration and remove impediments to effective communication.

Peopleware is extremely critical of the open-plan office, and I couldn't agree more. The prevailing implementation of it is clearly based more on the idea of cutting real-estate and office overhead costs than on encouraging productive communication. The result has all the charm of a cattle concentration feedlot, everyone getting their four square feet to exist in.

Another distortion of the Agile concepts embraced by management at the cost of actual effective development. That might make the CFO happy, but it's a false economy that should horrify the CTO.

Those capex savings can incur significant non-recurring engineering costs and create technical problems that will incur further downstream development and support costs. And that just means more opex for facilities where the engineering gets done, because the project takes longer.

You're paying me all this money to be productive and concentrate on complex problems, then you deliberately destroy my concentration to save on furniture and floorspace? It's like a real-life version of Kurt Vonnegut's short story Harrison Bergeron. What does that do to the product design and quality? What customer problems does it create, with attendant opportunity costs?

I turned down an excellent job offer in 2012 after the on-site interviews because of this. I was bludgeoned by my impression of the office environment: sweatshop. They probably thought of me as a prima donna.

McConnell also recommends against this, referencing the 2018 article It's Official: Open-Plan Offices Are Now the Dumbest Management Fad of All Time, which summarized the findings of a Harvard study on the topic. The practice appears to me to be the office-space equivalent of Taylorism.

Ok, now that I have all that off my chest, on to the actual reviews.

Clean Agile, Robert C. Martin

Martin's premise is that Agile has gotten muddled. He says it has gotten blurred through misinterpretation and usurpation.

His purpose is to set the record straight, "to be as pragmatic as possible, describing Agile without nonsense and in no uncertain terms."

He starts out with the history of Agile, how it came about, and provides an overview of what it does. He then goes on to cover the reasons for using it, the business practices, the team practices, the technical practices, and becoming Agile.

An important concept is the Iron Cross of project management: good, fast, cheap, done: pick any three. He says that in reality, each of these four attributes have coefficients, and good management is about managing those coefficients rather than demanding they all be at %100; that is the kind of management Agile strives to enable, by providing data.

The next concept is Ron Jeffries' Circle of Life: the diagram decribing the practices of XP (eXtreme Programming). Martin chose XP for this book because he says it is the best defined, the most complete, and the least muddled of the Agile processes. He references Kent Beck's Extreme Programming Explained: Embrace Change (he prefers the original 2000 edition; my copy is due to arrive week after next).

The enumeration and description of the various practices surprised me, reinforcing his point that things have gotten muddled. While I was aware of them, I was not aware of their original meanings and intent.

The most mind-blowing moment was reading about acceptance tests, under the business practices. Acceptance tests have become a real hand-waver, "one of the least understood, least used, and most confused of all the Agile practices."

But as he describes them, they have the power to be amazing:
  • The business analysts specify the happy paths.
  • QA writes the tests for those cases early in the sprint, along with the unhappy paths (QA engineer walks into a bar; orders a beer; orders 9999 beers; orders NaN beers; orders a soda for Little Bobby Tables; etc.). Because you want your QA people to be devious and creative in showing how your code can be abused, so that you can prevent anyone else from doing it. You want Machiavelli running your QA group.
  • The tests define the targets that developers need to hit.
  • Developers work on their code, running the tests repeatedly, until the code passes them.
Holy crap! Holy crap! This ties actual business-defined requirements end-to-end through to the running code. It is a fractal-zoom-out-one-level application of Test Driven Development (and we all thought TDD was just for the developer-written unit tests!).

It completely changes the QA model. Then the unit and acceptance tests get incorporated into Continuous Build, under the team practices.

There are other important business practices that I believe are poorly understood, such as splitting and spikes. Splitting means splitting a complex story into smaller stories, as long as you maintain the INVEST guidelines:
  • Independent
  • Negotiable
  • Valuable
  • Estimable
  • Small
  • Testable
Splitting is important when you realize a story is more complex than originally thought, a common problem. Rather than trying to beat it into submission (or be beaten into submission by the attempt), break it apart and expose the complexity in manageable chunks.

I never knew just what a spike was. It's a meta-story, a story for estimating a story. It's called that because it's a long, thin slice through all the layers of the system. When you don't know how to estimate a story, you create a spike for the sole purpose of figuring that out.

Almost as mind-blowing is his discussion of the technical practices. Mind-blowing because much of this whole area has been all but ignored by most Agile implementations. Reintroducing them is one of the strengths of this book.

Martin has been talking about this for a while. He gave the talk in this video, Robert C. Martin - The Land that Scrum Forgot, at a 2011 conference (very watchable at 2x speed). The main gist is that Scrum covered the Agile management practices, but left out the Agile technical practices, yet they are fundamental to making the methodology succeed.

These are the XP practices:
  • Test-Driven Development (TDD), the double-entry bookkeeping of software development.
  • Refactoring.
  • Simple Design.
  • Pair Programming.
Of these, I would say TDD is perhaps the most-practiced. But all of these have been largely relegated to a dismissive labeling as something only the extremos do. Refactoring is seen as something you do separately when things get so bad that you're forced into it. Pair programming in particular is viewed as a non-starter.

I got my Scrum training in a group class taught by Jeff Sutherland, so pretty much from the horse's mouth. That was 5 years ago, so my memory is a bit faded, but I don't remember any of these practices being covered. I learned about sprints and stories and points, but not about these.

As Martin describes them, they are the individual daily practices that developers should incorporate into every story as they do them. Every story makes use of them in real-time, not in some kind of separate step.


Refactoring builds on the TDD cycle, recognizing that writing code that works is a separate dimension from writing code that is clean:
  1. Create a test that fails.
  2. Make the test pass.
  3. Clean up the code.
  4. Return to step 1.
Simple Design means "writing only the code that is required with a structure that keeps it simplest, smallest, and most expressive." It follows Kent Beck's rules:
  1. Pass all the tests.
  2. Reveal the intent (i.e. readability).
  3. Remove duplication.
  4. Decrease elements.
Pair programming is the one people find most radical and alarming. But as Martin points out, it's not an all-the-time 100% thing. It's an on-demand, as-needed practice that can take a variety of forms as the situation requires.

Who hasn't asked a coworker to look over some code with them to figure something out? Now expand that concept. It's the power of two-heads-are-better-than-one. Maybe trading the keyboard back and forth, maybe one person driving while the other talks. Sharing information, knowledge, and ideas in both directions, as well as reviewing code in real-time. There's some bang for the buck!

The final chapters cover becoming Agile, including some of the danger areas that get in the way, tools, coaching (pro and con), and Mancuso's chapter on craftsmanship, which reminds us that we do this kind of work because we love it. We are constantly striving to be better at it. I am a software developer. I want to be professional about it. This hearkens back to the roots of Agile.

More Effective Agile, Steve McConnell

McConnell has a very direct, pragmatic writing style. He is brutally honest about what works and what doesn't, and the practical realities and difficulties that organizations run into.

His main goal is addressing practical topics that businesses care about, but that are often neglected by Agile purists:
  • Common challenges in Agile implementation.
  • How to implement Agile in only part of the organization (because virtually every company will have parts that simply don't work that way, or will interact with external entities that don't).
  • Agile's support for predictability.
  • Use of Agile on geographically distributed teams
  • Using Agile in regulated industries.
  • Using Agile on a variety of different types of software projects.
He focuses on techniques that have been proven to work over the past two decades. He generalizes non-Agile approaches as Sequential development, typically in some sort of phased form.

The book contains 23 chapters, organized into these 4 parts:
  • INTRODUCTION TO MORE EFFECTIVE AGILE
  • MORE EFFECTIVE TEAMS
  • MORE EFFECTIVE WORK
  • MORE EFFECTIVE ORGANIZATIONS
It includes full bibliography and index.

Throughout, he uses the key principle of "Inspect and Adapt": inspect your organization for particular attributes, then adapt your process as necessary to improve those attributes.

Another important concept is that Agile is not one monolithic model that works identically for all organizations. It's not one-size-fits-all, because the full range of software projects covers a variety of situations. So the book covers the various ways organizations can tailor the practices to their needs. Probably to the horror of Agile purists.

Each chapter is organized as follows:
  • Discussion of key principles and details that support them. This includes problem areas and various options for dealing with them.
  • Suggested Leadership Actions
  • Additional Resources
The Suggested Leadership Actions are divided into recommended Inspect and Adapt lists. The Inspect items are specific things to examine in your organization. I suspect they would reveal some rude surprises. The Adapt items cover actions to take based on the issues revealed by inspection.

The Additional Resources list additional reading if you need to delve further into the topics covered.

One of the very useful concepts in the book is the "Agile Boundary". This draws the line between the portion of the organization that uses Agile techniques, and the portion that doesn't. Even if the software process is 100% Agile, the rest of the company may not be.

Misunderstanding the boundary can cause a variety of problems. But understanding it creates opportunities for selecting an appropriate set of practices. This is helpful for ensuring successful Agile implementation across a diverse range of projects.

A significant topic of discussion is the tension between "pure Agile" and the more Sequential methods that might be appropriate for a given organization at a given point in a project.

The Agile Boundary defines the interface where the methods meet, and which methods are appropriate on each side of it under given circumstances. Again, Agile is not a single monolithic method that can be applied identically to every single project. As he says, it's not a matter of "go full Agile or go home".

There's a lot of information to digest here, because it all needs to be taken in the context of your specific environment. The chapters that stand out to me based on my personal experience:
  • More Effective Agile Projects: keeping projects small and sprints short; using velocity-based planning (which means you need accurate velocity measurement), delivering in vertical slices, and managing technical debt; and structuring work to avoid burnout.
  • More Effective Agile Quality: minimizing the defect creation gap (i.e. finding and removing defects quickly, before they get out); creating and using a definition of done (DoD); maintaining a releasable level of quality at all times; reducing rework, which is typically not well accounted for.
  • More Effective Agile Testing: using automated tests created by the development team, including unit and acceptance tests, and monitoring code coverage.
  • More Effective Agile Requirements Creation: stories, product backlog, refining the backlog, creating and using a definition of ready (DoR).
  • More Effective Agile Requirements Prioritization: having an effective product owner, classifying stories by combined business value and development cost.
  • More Effective Agile Predictability: strict and loose predictability of cost, schedule, and feature set; dealing with the Agile Boundary.
  • More Effective Agile Adoptions.
Requirements make an interesting area, because that is often a source of problems. The Agile approach is to elicit just enough requirements up front to be able to size a story, then rely on more detailed elicitation and emergent design when working on the story.

But the problem I've seen with that is one of the classic issues in estimation. Management tends to treat those very rough initial estimates as commitments, not taking into account the fact that further refinement has been deferred. So downstream dependent commitments get made based on them.

The risk comes when further examination of the story reveals that there is more work hidden underneath than originally believed. I've seen this repeatedly. Then the whole chain of dependent commitments gets disrupted, creating chaos as the organization tries to cope.

For example, consumer-product embedded systems are very sensitive to this. The downstream dependent commitments involve hardware manufacturing and the retail pipeline, where products need to be pre-positioned to prepare for major sales cycles such as holidays.

The Christmas sales period means consumer products need to be in warehouses by mid-November at the latest. Both the hardware manufacturing facilities (and their supply chains) and the sales channels are Taylor-style systems, relying on bulk delivery and just-in-time techniques. They need predictability. That's your Agile Boundary right there, on two sides of the software project.

IOT products have fallen into the habit of relying on a day 1 OTA update after the consumer unboxes them, but that's risky. If the massive high-scale OTA of all the fielded devices runs into problems, it creates havoc for consumers, who are not going to be happy. That can have significant opportunity costs if it causes stalled revenue or returns, or some horribly expensive solution to work around a failed OTA, not to mention the reputation effect on future sales.

What about commercial/industrial embedded systems? Cars, planes, factory equipment, where sales, installation, and operation are dependent on having the software ready. These can have huge ripple effects.

Online portal rollouts that gate real-world services are also sensitive to it. Martin uses the example of healthcare.gov. People need to have used the system successfully by a certain date in order to access real-world services, with life-safety consequences.

Both of these highlight the real-world deadlines that make business sense for picking software schedule dates. As software engineers, we can't just whine about arbitrary, unreasonable dates. There's a whole chain of dependencies that needs to be managed.

Schedule issues need to be surfaced and addressed as soon as possible, just like software bugs. The later in the process a software bug is identified, the more expensive it is to fix, sometimes by orders of magnitude. Dealing with schedule bugs is no different.

In his book on estimation, McConnell talks about the Cone of Uncertainty, the greater uncertainty about estimates early in the project, that narrows to better certainty over time as more information is available. Absolute certainty only comes after the completion. But everybody behaves as if the certainty is much better much earlier.

It's clear from the variety of information in this book that Agile is not simply a template that can be laid down across any organization and be successful. It takes work to adapt it to the realities of each organization. There is no simple recipe for success. No silver bullets.

That's why it's necessary to re-read this periodically, because each time you'll be viewing it in the context of your organization's current realities. That's continuing the Inspect and Adapt concept.

Update Nov 10, 2019


My copy of Beck's Extreme Programming Explained arrived yesterday, and I've been reading through it. Here we see the benefits of going back to original sources, in this case on open plan offices. In Chapter 13, "Facilities Strategy", he says:
The best setup is an open bullpen, with little cubbies around the outside of the space. The team members can keep their personal items in these cubbies, go to them to make phone calls, and spend time at them when they don't want to be interrupted. The rest of the team needs to respect the "virtual" privacy of someone sitting in their cubby. Put the biggest, fastest development machines on tables in the middle of the space (cubbies might or might not contain machines).
So it appears what caught on was the group open bullpen part, and what has been left out was the personal space part (and it's attendant value).

There's a continuous spectrum on which to interpret Beck's recommendation, with the typical modern open office representing one end (all open space, no private space), and individual offices representing the other (no open space, all private space).

There's a point on the spectrum where I would shift to liking it, if I had a private place to make my own where I could concentrate in relative quiet, with enough space to bring in a pairing partner.

Where I find the open office breaks down is the overall noise level from multiple separate conversations. It can be a near-constant distraction when I'm trying to work (hence the rampant proliferation of headphones in open offices).

Meanwhile, when I need to have a conversation with someone, I want to be able to do it without competing with all those others, and without disturbing those around me.

What seems to me to have the most practical benefit is optimizing space for two-person interactions, acoustically isolated from other two-person interactions. So individual workspaces with room for two to work together. That allows for individual time as well as the pairing method, from simple rubber-duck debugging to full keyboard and mouse back-and-forth.

Those are both high-value, high-quality times. That's the real value proposition for the company.

And in fact, that's precisely the kind of setup Beck says Ward Cunningham told him about.

Given that most developers now work on dedicated individual machines, through which they might be accessing virtualized cloud computing resources, the argument for a centralized bullpen with machines seems less compelling.

The open bullpen space seems to be less optimal, but still useful for times when more than two people might be involved.

This is clearly a philosophical difference from Beck's intent, but I think the costs of open plan offices as he experienced them, tempered by the reality of how they've been adopted, outweigh their benefits.

Meanwhile, his followup discussion in that chapter is fully in harmony with Peopleware's Part II: "The Office Environment".

Monday, July 8, 2019

Review: Engineering A Safer World, by Nancy Leveson

This is a 6-year-old post cross-posted from my woodworking blog (written before I had this blog available). It remains as timely and important as ever. I'm reposting it motivated by the discussion of the Boeing 737 MAX, such as at EmbeddedArtistry.com (and mentioned at Embedded.fm).

As a software engineer I've been a dedicated reader of RISKS DIGEST for over 20 years. Formally, RISKS is the Forum On Risks To The Public In Computers And Related Systems, ACM Committee on Computers and Public Policy, moderated by Peter G. Neumann (affectionately known to all as PGN).

RISKS is an online news and discussion group covering various mishaps and potential mishaps in computer-related systems, everything from data breaches and privacy concerns to catastrophic failures of automated systems that killed people. It's an extremely valuable resource, exposing people to many concerns they  might otherwise not know about.

All back issues are archived and available online. It's fascinating to see the evolution of computer-related risks over time. It's also disheartening to see the same things pop up year after year as sad history repeatedly repeats itself.

Nancy Leveson's work on safety engineering has been mentioned in RISKS ever since volume 1, issue 1. She's currently Professor of Aeronautics and Astronautics and Professor of Engineering Systems at MIT. Her 2011 book Engineering A Safer World, Systems Thinking Applied to Safety, was noted in RISKS 26.71, but has not yet been reviewed there. I offer this informal review.

This book should be required reading for anyone who wishes to avoid having their work show up as a RISKS news item. There's no excuse for not reading it: Leveson and MIT Press have made it available as a free downloadable PDF (555 pages), which is how I read it. The download link is available on the book's webpage at http://mitpress.mit.edu/books/engineering-safer-world.

This was my first introduction to formal safety engineering, so yes, I speak with the enthusiasm of the newly evangelized.

The topic is the application of systems theory to the design of safer systems and the analysis of accidents in order to prevent future accidents (not, notably, to assign blame). Systems theory originated in the 1930's and 1940's to cope with the increasing complexity of systems starting to be built at that time.

This theory holds that systems are designed, built, and operated in a larger sociotechnical context. Control exists at multiple hierarchical levels, with new properties emerging at higher levels ("emergent properties"). Leveson says safety is an emergent property arising not from the individual components, but from the system as a whole. When analyzing an accident, you must identify and examine each level of control to see where it failed to prevent the accident.

So while an operator may have been the person who took the action that caused an accident, you must ask why that action seemed a reasonable one to the operator, why the system allowed the operator to take that action, why the regulatory environment allowed the system to be run that way, etc. Each of these levels may have been an opportunity to prevent the accident. Learning how they failed to do so is an opportunity to prevent future accidents.

Furthermore, systems and their contexts are dynamic, changing over time. What used to be safe may no longer be. Consider that most systems are in use for decades, with many people coming and going over time to maintain and operate them, while much in the world around them changes. Leveson says most systems migrate to states of higher risk over time. If safety is not actively managed to adapt to this change, accidents become inevitable.

Another important point is the distinction between reliability and safety. Components may operate reliably at various levels, yet still result in an accident, frequently due to the interactions between components and subsystems.

Much of Leveson's view can be summarized in two salient quotes. First is a brief comment on the human factor: "Depending on humans not to make mistakes is an almost certain way to guarantee that accidents will happen."

The second is more involved: 

"Stopping after identifying inadequate control actions by the lower levels of the safety control structure is common in accident investigation. The result is that the cause is attributed to "operator error," which does not provide enough information to prevent accidents in the future. It also does not overcome the problem of hindsight bias. In hindsight, it is always possible to see that a different behavior would have been safer. But the information necessary to identify that safer behavior is usually only available after the fact. To improve safety, we need to understand the reasons people acted the way they did. Then we can determine if and how to change conditions so that better decisions can be made in the future.

"The analyst should start from the assumption that most people have good intentions and do not purposely cause accidents. The goal then is to understand why people did not or could not act differently. People acted the way they did for very good reasons: we need to understand why the behavior of the people involved made sense to them at the time."

The book is organized into three parts. Part I, "Foundations," covers traditional safety engineering (specifically, why it is inadequate) and introduces systems theory. Part II, "STAMP: An Accident Model Based On Systems Theory," introduces System-Theoretic Accident Model and Processes, covering safety constraints, hierarchical safety control structures, and process models. Part III, "Using STAMP," covers how to apply it, including the STPA (System-Theoretic Process Analysis) approach to hazard analysis and the CAST (Causal Analysis based on STAMP) accident analysis method.

Throughout, Leveson illustrates her points with accidents from various domains. These cover a military helicopter friendly-fire shootdown, chemical and nuclear plant accidents, pharmaceutical issues, the Challenger and Columbia space shuttle losses, air and rail travel accidents, the loss of a satellite, and contamination of a public water supply. They resulted in deaths, injuries with prolonged suffering, destruction, and significant financial losses. There's also one fictional case used for training purposes.

The satellite loss was an example where there was no death, injury, or ground damage, but an $800 million satellite was wasted, along with a $433 million launch vehicle (all due to a single misplaced decimal point in a software configuration file). Financial losses in all cases included secondary costs due to litigation and loss of business. Accidents are expensive in both humanity and money.

Several accidents are examined in great detail to expose the complexity of the event and glean lessons, identifying the levels of control, the system hazards they faced, and the safety constraints they violated. They show that the answer to further prevention is not simply to punish the operator on duty at the time. What's to prevent another accident from occurring under a different operator? What systemic factors exist that increase the likelihood of accidents?

These systems affect us every day. During the time I was reading the book, there was an airline crash at San Francisco, a fiery oil train derailment in Canada, and a major passenger train derailment in Spain. I started reading it while a passenger on an aircraft model mentioned 14 times in the book, and read the remainder while traveling to and from work on the Boston commuter rail.

The book can be read on several levels. At a minimum, the cases studies and analyses are horribly fascinating for the lessons they impart. Fans of The Andromeda Strain will be riveted.

As I read the account of two US Black Hawk helicopters shot down by friendly fire in Iraq, I could visualize a split screen showing the helicopters flying low in the valleys of the no-fly zone to avoid Iraqi air defense radar, the traces going inactive on the AWACS radar scopes, the F-15's picking up unidentified contacts that did not respond to IFF, and the mission controllers back in Turkey, as events ground to their inexorable conclusion. It made my hair stand on end.

All the case studies are equally jaw-dropping, down to the final example of a contaminated water supply in Ontario. Further shades of Andromeda, since that was a biological accident resulting in deaths.

They're all examples of systems that migrated to very high risk states, where they became accidents waiting to happen. It was just a matter of which particular event out of the many possible triggered the accident.

Part of what's so shocking about these cases is the enormously elaborate multilayered safety systems that were in place. The military goes to great lengths in its air operations control to avoid friendly fire incidents, the satellite software development process had numerous checkpoints, NASA had a significant safety infrastructure.

Yet it seems that this very elaborateness contributed to a false sense of safety, with uncoordinated control leaving gaps in coverage. In some cases this led to complacency that resulted in scaling back safety programs.

The other shocking cases were at the opposite end of the spectrum, where plants were operated more fast and loose.

The one bright spot in the case studies is the description of the US Navy's SUBSAFE program, instituted after the loss of the USS Thresher in 1963. It flooded during deep dive testing; despite emergency recovery attempts by the crew, they were unable to surface. Just pause and think about that for a moment.

SUBSAFE is an example of a tightly focused and rigorously executed safety program. The result is that no submarine has been lost in 50 years, with the exception of the USS Scorpion in 1968, where the program requirements were waived. The result of that tragic lesson was the requirements were never again waived.

The book can be read at an academic level, as a study of the application of systems theory to the creation of safer systems and analysis of accidents. It can be read at an engineering level, as a guide on how to apply the methodology in the development and operation of such systems. It's not a cookbook, but it points you in the right direction. It includes an extensive bibliography for follow-up.

Even those who work on systems that don't present life safety or property damage risks can benefit, because any system behaving poorly can make people's lives miserable. They frequently pose significant business risks, affecting the life and death of a company.

This book paired with PGNs book Computer-Related Risks would make an excellent junior or senior level college survey course for all engineering fields, along the lines of "with great power comes great responsibility". While some might feel it's a text more suited to graduate level practicum, I think it's worth conveying at the undergraduate level for broader distribution.

Friday, November 23, 2018

Review: Web-Based Course Test-Driven Development For Embedded C/C++, James W. Grenning


Full disclosure: I was given a seat in this course by James Grenning.

I took James Grenning's 3-day web-based course Test-Driven Development for Embedded C/C++ September 4-6, 2018. It was organized as a live online delivery, 5 hours each day. The schedule worked out perfectly for me in Boston, starting at 9AM each morning, but he had attendees from as far west as California and as far east as Germany, spanning 9 time zones.

The participants ranged from junior level embedded developers to those with more than 20 years of experience. One worked in a fully MISRA-compliant environment. This was the first introduction to TDD for some of them.

The course was organized as blocks consisting of presentation and discussion, coding demo, then live coding exercises. It used CppUTest as the TDD framework.

The short answer: this is an outstanding course. It will change the way you work. I highly recommend it, well worth the investment in time and money. The remote delivery method worked great.

I had previously read Grenning's book, Test Driven Development for Embedded C, which I reviewed in August. I covered a lot of his technical information on TDD in the review, so I'll only touch on that briefly here. He covers the same material in the course presentation portions.

The course naturally has a lot of overlap with the book, so each can serve as a standalone resource. But I found the combination of the two to be extremely valuable. They complement each other well because the book provides room to delve more deeply into background information, while the course provides guided practice time with an expert.

Reading the book first meant I was fully grounded in the motivations and technical concepts of TDD, so I was ready for them when he covered them in the course. I was also already convinced of its value. What the live course brings to that part is the opportunity to ask questions and discuss things.

You can certainly take the course without first reading the book, which was the case for several of the participants.


Presentations

For the presentation portions, Grenning covered the issues with development using non-TDD practices, what he calls "debug-later programming" (DLP). This consists of a long phase of debug fire-fighting at the end of development, that often leaves bugs behind.

He introduced the TDD microcycle, comparing the physics of DLP to the physics of TDD. By physics he means the time domain, the time taken from injection of a bug (repeat after me: "I are a ingenuer, I make misteaks, I write bugs") to its removal. This is one of the most compelling arguments for adopting TDD. TDD significantly compresses that time frame.

He covered how to apply the process to embedded code and some of the design considerations. He also talked about the common concerns people have about TDD.

One quote from Kent Beck that I really liked:
TDD is a great excuse to think about the problem before you think about the solution.
He covered the concept of test fakes and the use of "spies" to capture information. He covered mocks as well, including mocking the actual hardware so that you can run your tests off-target.

He covered refactoring to keep tests easy to follow and maintain. He also covered refactoring of "legacy code" (i.e. any production code that was not built using TDD), including "code smells" and "code rot", using TDD to provide a safety harness. This included a great quote from Donald Knuth (bold emphasis mine):
Let us change our traditional attitude to the construction of programs. Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

Coding Demos

Grenning performed two primary live coding demos. First, he used TDD to build a circular buffer, a common data structure in embedded systems. He used this to demonstrate the stepwise process of applying the TDD microcycle.

Second, he performed a refactoring demo on a set of tests. He used this to show how to apply the refactoring steps to simplify the tests and make the test suite more readable and maintainable.

This was just as valuable as the TDD microcycle, because a clean test suite means it will live a long and useful life. Failing to refactor and keep it clean risks making it a one-off throwaway after getting its initial value.


Coding Exercises

Grenning uses Cyber-Dojo to conduct exercises (as well as his demos). This is a cloud-based, Linux VM, ready-to-use, code-build-run environment that allows each student to work individually, but he can monitor everyone's code as they work. This turned out to be one of the most valuable aspects of the course.

I should also mention that I had read Jeff Langr's book Modern C++ Programming with Test-Driven Development: Code Better, Sleep Better in between reading Grenning's book and taking this course. Langr puts a lot of emphasis on short, easily-readable tests, and that's something that also comes out in Grenning's class.

What was so valuable about doing these exercises in Cyber-Dojo is that Grenning was able to stop someone who was heading off in the wrong direction and quickly bring them back on track, or help them if they weren't sure what to do next. That fast feedback cycle is very much in tune with TDD itself. It works just as well as a teaching method.

So if someone started writing code without a test, or wrote too much code for what the test covered, or had too much duplication in tests, or had too much setup that could be factored out, he let them know and guided them back. In some cases he interrupted the exercise to go through someone's code on the screen with everybody watching, not to put them on the spot, but to cover the issues that we all run into.

That's critical because learning to truly work effectively in TDD style requires a reorientation of your thinking. We all have the coding habits of years that need to be overcome.

That doesn't happen automatically just because you read a book and have decided to follow it. It takes effort; half the effort is just noticing that you're straying from the intended path. That's the value of having a live instructor who can watch over your shoulder. It's like being an apprentice under the watchful eye of a master craftsman.

For me, this was ultimately the greatest value in the class. Having Grenning provide real-time guidance had an immediate effect on my coding, for both the test code and the production code. Whether it was talking about my mistakes or someone else's, I was able to immediately improve my work.

That made a huge difference between the test code I wrote before the class and the test code I wrote by the end of the class.

The coding exercises were building our own circular buffer, building a light controller spy, using TDD with the spy to implement a light scheduler, and implementing a flash chip driver. Note that these exercises are also covered in his book.

I also found that Cyber-Dojo made for an interesting example of pair programming, something I've never done before. Grenning provided initial files to work on, like a pair partner guiding you in the next step, then provided active feedback, like a partner asking questions and making suggestions: "Are you missing something there? What if you tried this? Wait, before you do that...".


The Big Lesson

The big lesson for me from this course was that it finally drove home that TDD is ALL ABOUT DEVELOPMENT! Sometimes I have to be clubbed over the head for something to really sink in, and that's what happened here.

We get so focused on the word "test" in TDD that we jump to the conclusion that it's just a test methodology. We emphasize test, as in TEST-Driven Development.

But really, the emphasis should be reversed, it's Test-Driven DEVELOPMENT. That means you apply design concepts and address the requirements of the product as you engage in a very active development thought process that is driven forward by tests.

Did you ever write some throwaway test code just so you could see how something worked, or to explore some design ideas? Hmmm, well TDD formalizes that.

The fact that you do end up with useful unit tests is almost a side effect of the process. An extremely valuable side effect, but a side effect nonetheless.

The real output of the process is working production code. That's what really matters. That's the real goal.

At some point on the last day of the course, I recognized the change in emphasis deep in my being. Maybe the difference is subtle, but it is critical.

That recognition first started to dawn after I read the book and applied it at work. I was amazed at the cleanliness of the resulting code. It was DRY and DAMP and SOLID, with no further refinement or debugging required.

Yes, I had a unit test suite. But look at the production code! It was breathtaking, right out of the chute. That was motivating.

It was in that receptive frame of mind that I did the coding exercises in the course. That was when the club hit. It was one of those moments of realization where you divide time into what came before, and what came after, the physical moment of grok, providing a whole new lens through which to perceive the work.

Savor that consideration for a moment.

People have been saying for years that TDD is about development, but we tend to focus on the test. Grenning emphasizes development when he talks about developing while "test-driving", meaning he is doing his development driven by tests. I guess it just takes time for the real implications to sink in.

One of Grenning's slides quotes Edsger Dijkstra:
Those who want really reliable software will discover that they must find means of avoiding the majority of bugs to start with, and as a result, the programming process will become cheaper. If you want more effective programmers, you will discover that they should not waste their time debugging, they should not introduce the bugs to start with.
While we all aspire to be like Dijkstra, this seems like a pipe dream. Until you realize that TDD does exactly that. It provides the shortest path to working software. I think he would have liked that.

Now that I've relegated the test aspect of this to second-class citizenship, let me bring it back to prominence.

The testing aspect approaches Dijkstra's ideal, because it finds bugs immediately as part of the code, build, test cycle. So the bugs are squashed before they've had time to scatter and hide in the dark corners. That reduces the dreaded unbounded post-development debug cycle to near zero.

If you don't let bugs get into the code, you won't have to spend time later removing them. Yeah, what Dijkstra said.

This doesn't guarantee bug-free code. There might still be bugs that occur during the integration of parts that are working (for example, one module uses feet, while another uses inches), or the code may not have addressed the requirements properly (the customer wanted a moving average of the last 5 data points, while the code uses the average of all data points), but as a functional unit, each module is internally consistent and working according to its API.

The resulting unit test suite is an extremely valuable resource, just as valuable as the production code. What makes it so valuable? Two things: safety harness, and example usage.

It provides a safety harness to allow you to do additional work on the code, then run the suite as a regression test to prove you haven't broken anything. Or to detect breakage so you can fix it immediately.

Using and extending the suite liberates you to make changes to the code safely. Need to add some functionality? Fix one of those integration or requirements bugs? Refactor for better performance or maintainability? Clean up some tech debt? Have at it.

You can instantly prove to yourself that you haven't screwed anything up, or show that you have, so that you can fix it before it ever gets committed to the codebase. No one will ever see that dirty laundry.

It provides example usage, showing how to use the API: how to call the various functions, in what order, how to setup and drive various behavioral scenarios, how to exercise the interfaces for different functional behaviors, how different parameters affect things, how to interpret return values.

This is real, live code, showing how to use the production code as a client. You can even get creative and add exploratory tests that push the production code in odd directions to see what happens. Grenning calls these characterization tests and learning tests.

The test suite is actually something quite magical: self-updating documentation! Since you need to invest the time to maintain the tests in order to get the development done, you are also automatically updating the example usage documentation for free.

You might argue that tools like Doxygen offer similar self-updating capability, but they still require updating textual explanations along with the code. They are subject to the same staleness that can happen with any comments, where the comments (or Doxygen annotations) aren't kept up to date with code changes (see Tim Ottinger's Rules for Commenting for advice to help avoid stale comments).

But if you want to really know how to use the production code, go read the tests! If you've truly followed the TDD process as Grenning shows you in this course, they will tell you how to produce every bit of behavior that it is capable of, because every bit of behavior implemented will have been driven by the tests.

That's the full-circle, closed-loop feedback power of test-driven DEVELOPMENT.

Doxygen still has its place. I think of the Doxygen docs as API reference, while the test suite is API tutorial, showing actual usage.


Another Lesson

I've already alluded to the other interesting lesson that I drew from this course: it takes practice! We're not used to working like this, so it takes practice and self-awareness to learn how to do it.

That was particularly driven home by the coding exercises. Even though I had just read his book and followed through the exact same exercises, and read Langr's book, and applied the knowledge at work, I still had trouble getting rolling on the first couple of exercises. It was a matter of instilling the new habits.

It took a few times having Grenning redirect me (or listen to the advice he gave someone else). By the final exercise, after the benefit of his live feedback, I was able to catch myself in time and start applying the habits on my own.

It's still going to take some time. I'll know I've gotten there when I start thinking of the tests automatically as the first step of coding.


Third Time's A Charm

At one point in the discussion I mentioned that Grenning's book and this course represented my third attempt at using TDD, and one of the participant said he would be interested in hearing about my previous attempts.

My first attempt was in 2007, when I was introduced to TDD by a coworker. I read Kent Beck's Test Driven Development: By Example and used it to develop the playback control module for a large video-on-demand server intended for use in cable provider head ends.

This was both a great success and a classic failure. It was a great success in that it accelerated my work on the module, avoiding many bugs and shortening the debug cycle. In that respect it lived up to the promise of TDD completely.

It was a classic failure in that I made the tests far too brittle. I put too much internal knowledge of the module in them, with many internal details that were useful when I was first developing the module, but that became a severe impediment to ongoing maintenance.

The classic symptom of this problem was that a minor change in implementation would cause a cascade of test failures. The production code was fine, but some internal detail such as a counter value that was being checked by the tests had changed. Otherwise the test code itself was also fine. But I had overburdened it with details that should have been hidden by encapsulation.

The result was that ultimately I had to abandon the test suite. It had provided good initial value, but failed to deliver on-going value because it became a severe maintenance burden.

This is exactly the type of situation that Grenning's course seeks to prevent. During coding exercises, he watches out for cases of inappropriate information exposure. Thus another benefit of this is improved encapsulation and information hiding.

My second attempt was in 2013, when I wanted to refactor some of the code in an IP acceleration server as part of improvements to one of its features. I had read Michael Feathers' Working Effectively with Legacy Code, and found that many of the things he covered applied to the codebase I was working on.

This was a revenue-generating service product, so I needed to be sure I didn't break it.

The main strategy the book covers is to use TDD to provide that safety harness I mentioned above, in order to verify that the legacy code behaves the same after modification as it did before.

I began building a set of test fakes that could be used with Google Test. One issue was that the code relied heavily on the singleton pattern, so there always had to be some implementation of each class that would satisfy the linker. And of course there were chains of such dependencies interlocked in a web.

My first task was to replace that bit by bit with dependency injection. I focused just on the parts necessary to allow me to test the area I was modifying. Part of Feathers' strategy is to tackle just enough of the system at a time to be able to make progress, rather than a wholesale break-everything-down-and-rebuild approach.

I had enough success with this that once I finished my primary work on the feature changes, I embarked on a background project to put the entire codebase into 100% dependency injection. That would allow me to build unit tests for any arbitrary component, in combination with any set of faked dependencies, with the longer-term goal of building out near-100% unit test coverage incrementally.

However, not too long after starting this, I ended up changing jobs. So once again I got the short-term benefit from TDD, but didn't reap the long-term benefit. It was a useful exercise to go through, though, providing good experience on how to migrate such a codebase to TDD.

This is another area that Grenning's course covers.


Related Links

For the perspective of another class participant, see Phillip Johnston's post What I Learned from James Grenning's Remote TDD Course.

There are things about the TDD process that make people suspicious. Is it just hacking? In this interview with Grenning, embedded systems expert Jack Ganssle raises some of those concerns. Grenning explains how the process works to reach the goal of well-designed, working production code that meets customer requirements.

Elecia and Christopher White have a great interview podcast with Grenning. Best joke: how many Scrum masters does it take to manage one developer? Also good Shakespeare and Bradbury quotes that are much ado about programming.

Friday, November 16, 2018

Accuracy Vs. Precision

This is nothing new, but it's something that needs to be constantly hammered home. It's an important point that can make a critical difference in the behavior of embedded systems interacting with the sloppiness of physics in the real world.

I was reminded of the topic by Elecia White's excellent video Intro to Inertial Sensors: From Taps to Gestures to Location. The inertial sensors that are now common in smartphones and embedded systems are accelerometers, gyroscopes, and magnetometers, possibly integrated into a single Inertial Measurement Unit (IMU).


But working in the digital world with sensor data converted from the analog world poses interesting problems. Some of these are addressed in Jack W. Crenshaw's amazing book Math Toolkit for Real-Time Programming. There is always error in the system to some degree, so you have to be prepared to handle it.

Accuracy and precision are two of those problems, and have been since the dawn of measurement. It's important to understand the distinction between them. They are often confused in informal usage.

A common analogy for understanding them is taken from riflery, showing a shooting target. White includes a version of it in her video. As I learned while earning the Boy Scout riflery merit badge at Resica Falls summer camp lo these many years ago, you want your shots to be tightly grouped together (precision), and you want that group to be on-target, centered around the bull's-eye (accuracy).

The following image is taken from the NOAA article Accuracy Versus Precision, which does a nice job of explaining the difference. I'll briefly restate it here should NOAA scientific information mysteriously disappear from the Web.

Accuracy is how close a measurement is to the true value, how close it is to the bull's-eye. Precision is how closely repeated measurements come to duplicating measured values, how tightly they are grouped.



Not Accurate Not Precise: these are not close to the bull's-eye, so the measurements are not close to the true value, and they are not tightly grouped, so repeated measurements have a lot of difference.

Accurate Not Precise: these are close to the bull's-eye, so the measurements are centered around the true value, but they are not tightly grouped, so repeated measurements range all over the place.

Not Accurate Precise: these are not close to the bull's-eye, so the measurements are not close to the true value, but they are tightly grouped, so repeated measurements are close to each other. From a riflery perspective, this is good, because it means you have control, you just need to adjust your sight to compensate.

Accurate Precise: these are both close to the bull's-eye and tightly grouped. The measurements are on-target, close to the true value, and repeated measurements give close to the same result.

In an embedded system, you need to characterize and calibrate things. Characterization means understanding how much variation a sensor has in its measurements, how precise it is (which, as White explains, can vary with temperature and barometric pressure, plus humidity, external vibration, external electrical and magnetic fields, external sources of Radio Frequency Interference (RFI), and other factors; man, the real world is a sloppy place!).

Calibration determines how far off the measurements are from the true value and adjusts the values to compensate for the difference.

Meanwhile, the calculations that use the values must be able to handle the accuracy and precision appropriately, along with odd cases such as a true zero value being measured as a small negative value (because the measurement is centered around zero, but may range between small negative and positive limits). Treating values as if they are more accurate or precise than they really are is downright dangerous.

That can lead the embedded system to crash or take inappropriate actions. If it happens to be controlling the flight of an airplane or the operation of a chemical plant, people can be killed and tremendous damage can result. If it happens to be controlling a consumer device, the consequences may be less dire, but can be equally damaging to the company.

There is always error and noise in the system. You have to understand it and how to manage it.

Saturday, September 22, 2018

So You Want To Be An Embedded Systems Developer

Then listen now to what I say.
Just get an electric guitar
and take some time and learn how to play.

Oh, wait, that's a song by the Byrds. But the strategy is the same. Get some information and tools and learn how to use them. No need to sell your soul to the company.

The items I've listed below are sufficient to get you started on a career as an embedded systems developer. There are of course many additional resources out there. But these will arm you with enough knowledge to begin.

I own or have watched every resource and piece of hardware listed on this page. I've either gone through them entirely, or am in the process of doing so. I can vouch for their usefulness in getting you up to speed. It's a firehose of learning.

My personal learning method is to bounce around between multiple books and videos in progress, while spending time hands-on with the hardware. This is similar to a college student juggling multiple classes with labs (without tests, term papers, or due dates!).

Your method may be different. Feel free to approach things in a different order. I offer this in the spirit of sodoto.

What's An Embedded System?

It's a computer that's embedded inside another product, like a car, a microwave, a robot, an aircraft, or a giant industrial machine in a factory; or an IoT device like an Amazon Echo, a Sonos speaker, or a SimpliSafe home security system. You think of the thing as the end product, not as a computer. The computer happens to be one of the things inside that makes it work.

The fascinating thing about embedded systems is that you get to have your hands in the guts of things. The code you write makes a physical object interact with the real world. It's a direct control feedback loop. Working on them is incredibly seductive.

Embedded systems are a multi-disciplinary endeavor. At a minimum they require a mix of electronics and software knowledge. Depending on the particular application (the end product you're building), they may also require some combination of mechanical, materials science, physics, chemical, biological, medical, or advanced mathematical knowledge.

Hobbyist vs. Professional Hardware

There's a wide range of hardware available to learn on, at very reasonable prices. Most of the microcontrollers and boards were originally aimed at the professional community, but advances in technology and falling prices have made them accessible to the hobbyist and educational communities.

Meanwhile, those same advances have enabled hardware aimed directly at the hobbyist and educational communities. Some of that hardware has advanced to the point that it is used in the professional community. So the lines have been blurred.

All of the boards covered here have a variety of hardware interfaces and connectors that allow you to connect up other hardware devices. These are the various sensors, indicators, and actuators that allow an embedded system to interact with the real world.

Two hobbyist/educational platforms are Arduino and Raspberry Pi. For a beginner, these offer a great way to start out. There's an enormous amount of information available on using them from the hobbyist, educational, and maker communities.

I've listed a few books on them below in the Primary Resources, and there are a great many more, as well as free videos and websites. These books tend to be written at a more beginner level than books aimed at professionals.

Arduino is a bare-metal platform, meaning it doesn't run an operating system. An IDE (Integrated Development Environment) is available for free, for writing and running programs on it. You program it with the C and C++ programming languages.

Many of the low-level details are taken care of for you. That's both the strength and the weakness of Arduino.

It's a strength because it offers a quick streamlined path to getting something running. That makes it a great platform for exploring new concepts and new hardware.

It's a weakness because it isolates you too much from the critical low-level details that you need to understand in order to progress beyond the level of beginner.

Those low-level details are the difference between success in the real world and dangerous mediocrity. Dangerous as in you can actually get people killed, so if you want to do this professionally, you need to understand the responsibility you're taking on.

My attitude is to take advantage of that streamlined path whenever needed, and use it to boost yourself into the more demanding work. There are always going to be new pieces of hardware to hook up to an Arduino. I'll always start out at the beginner level learning about them.

In that context, Arduino makes a great prototyping and experimentation platform, without having to worry so much about the low-level details. Then, every bit of knowledge I pick up that way can be carried over to more complex platforms. Meanwhile, Arduino is a perfectly capable platform in its own right.

Raspberry Pi is a Linux platform, meaning it is a single-board computer running the Linux operating system. In some ways it is similar to Arduino, in that many low-level details are taken care of for you.

But it is more capable due to more hardware interfaces and the Linux environment. It can operate as a full desktop computer in the palm of your hand. You program it with the Python, C, and C++ programming languages, as well as others. The Linux capability opens up lots of possibilities.

Many of the same arguments for and against Arduino apply to Raspberry Pi. It also offers a great way to learn Linux and its application to embedded systems. It can be used at the beginner level, but also offers greater range to go beyond that.

Professional hardware, aimed at commercial and industrial use, offers the classic embedded systems development experience. This is where you need to be able to dig down to the low levels. These platforms run both bare-metal and with operating systems.

The operating systems tend to be specialized, especially when the application requires true hard real-time behavior, but also include embedded Linux.

Hard real-time means the system must respond to real-world stimulus on short, fixed deadlines, reliably, every time, or the system fails. For instance, an aircraft flight control system that must respond to a sensor input within 100ms, or the plane crashes. Or a chemical plant process control system that must respond to a sensor within 100ms, or the plant blows up and spews a cloud of toxic chemicals over the neighboring city. Or a rocket nozzle control system that must respond to guidance computer input within 50ms or it goes off course and has to be destroyed, obliterating $800 million worth of satellite and launch vehicle.

Those are what system failure can mean, showing the responsibilities. There are hard real-time systems with less severe consequences of failure, as well as soft real-time systems with looser deadlines and allowable failure cases (such as a smart speaker losing its input stream after 200ms and failing to play music), but it's important to keep in mind what can be at stake.

If your goal is to work professionally as an embedded systems developer, you need to be able to work with the professional hardware. But don't hesitate to use the hobbyist hardware to give you a leg up learning new things. The broad range of experience from working with all of them will give you great versatility and adaptability.

The Primary Resources

The items listed below are all excellent resources that provide the minimum required knowledge for a beginner, progressing up to more advanced levels. If you already have some knowledge and experience, they'll fill in the gaps.

These are well-written, very practical guides. There's some overlap and duplication among them, but each author has a different perspective and presentation, helping to build a more complete picture.

They also have links and recommendations for further study. Once you've gone through them, you'll have the background knowledge to tackle more advanced resources.

The most important thing you can do is to practice the things covered. This material requires hands-on work to really get it down, understand it, and be able to put it to use, especially if you're using it to get a job.

Whether you practice as you read along or read through a whole book first, invest the time and effort to actually do what it says. That's how you build the skills and experience that will help you in the real world.

Expect to spend a few days to a few weeks on each of these resources, plus a few months additional. While they're mostly introductory, some assume more background knowledge than others, such as information on binary and hexadecimal numbers. You can find additional information on these topics online by searching on some of the keywords.

Some of the material can be very dense at first, so don't be afraid to go through it more than once. Also, coming back to something after having gone through other things helps break through difficulties.

Looking at this list, it may seem like a lot. Indeed, it is an investment in time and money, some items more than others. But if you think of each one as roughly equivalent to half a semester of a college course once you put in the time to practice the material, this adds up to about two years worth of focused college education.

That's on par with an Associate degree, or half of a Bachelor's degree. And it will leave you with practical skills that you can put to use on a real job.

These are in a roughly recommended order, but you can go through the software and electronics materials in parallel. You might also find it useful to jump around between different sections of different books based on your knowledge level at the time. Note that inexpensive hardware is listed in the next part of this post, including some of the boards these use.

If you find some of the material too difficult, never fear, back off to the beginner resources. If you find some too simple, never fear, it gets deep. Eventually, it all starts to coalesce, like a star forming deep in space, until it ignites and burns brightly in your mind.

The resources:
You can learn Arduino in 15 minutes.: This is a nice short video that talks about the basics of Arduino microcontroller systems. It helps to start breaking down the terminology and show some of the things involved. That makes it a good introduction to more involved topics. You can also dive down the rabbit hole of endless videos on Arduino, microcontrollers, and electronics from here. This guy's channel alone offers lots of good information.
Hacking Electronics: Learning Electronics with Arduino and Raspberry Pi, 2nd Edition, 2017, by Simon Monk. This is a great beginner-level hands-on book that covers just enough of a wide range of hardware and software topics to allow you to get things up and running, starting from zero knowledge.
Programming the Raspberry Pi: Getting Started with Python, 2nd Edition, 2016, by Simon Monk. This is a nice practical guide to Python on the Raspberry Pi, with much more detail on programming than his Hacking Electronics above. Meanwhile it has less beginner information on hardware. So the two books complement each other nicely.
Programming Arduino: Getting Started with Sketches, 2nd Edition, 2016, by Simon Monk. Similar to his book on Python, but for C on Arduino, also a nice complement to his Hacking Electronics.
Embedded Software Engineering 101: This is a fantastic blog series by Christopher Svec, Senior Principal Software Engineer at iRobot. What I really like about it is that he goes through things at very fine beginner steps, including a spectacular introduction to microcontroller assembly language.
Modern Embedded Systems Programming: This is a breathtakingly spectacular series of short videos by Miro Samek that take you from the ground up programming embedded systems. They're fast paced, covering lots of material at once, including the C programming language, but he does a great job of breaking things down. He uses an inexpensive microcontroller evaluation kit (see hardware below) and the free size-limited evaluation version of the IAR development software suite. He also has a page of additional resource notes. What I really like about this is that in addition to covering a comprehensive set of information with many subtle details, he shows exactly how the C code translates to data and assembly instructions in microcontroller memory and registers. In contrast to Arduino, this is all the low-level details. You will know how things work under the hood after this course (currently 27 videos). Along the way you'll pick up all kinds of practical design, coding, and debugging skills that would normally take years to acquire. Did I mention this course is freakin' awesome?
RoboGrok: This is an amazing complete online 2-semester college robotics video course by Angela Sodemann at Arizona State University, available to the public. Start with the preliminaries page. In addition to some of the basics of embedded systems, it covers kinematics and machine vision, doing hands-on motor and sensor control through a PSoC (Programmable System on a Chip) board. She sells a parts kit, listed below. This is a great example of applied embedded systems.
C Programming Language, 2nd Edition, 1988, by Brian W. Kernighan and Dennis M. Ritchie: C is the primary language used for embedded systems software, though C++ is starting to become common. This is the seminal book on C, extremely well-written, that influenced a generation of programming style and other programming books. The resources listed above all include some basics of C, and this will complete the coverage.
Embedded C Coding Standard, 2018 (BARR-C:2018), by Michael Barr: This will put you on the right track to writing clean, readable, maintainable code with fewer bugs. It's a free downloadable PDF, which you can also order as an inexpensive paperback. Coding standards are an important part of being a disciplined developer. When you see ugly, hard to read code, you'll appreciate this.
Programming Embedded Systems: in C and C++, 1999, by Michael Barr: Even though this is now 20 years old, it's a great technical introduction and remains very relevant. Similar in many respects to Samek's video series, it takes a beginner through the process of familiarizing yourself with the processor and its peripherals, and introduces embedded operating system concepts. There is a later edition available, but this one is available used at reasonable prices. 
Programming Arduino Next Steps: Going Further with Sketches, 2nd Edition, 2019, by Simon Monk. This goes deeper into Arduino, covering more advanced programming and interfacing topics. It also includes information on the wide array of third-party non-Arduino boards that you can program with the IDE. This starts to get past the argument that Arduino is just for beginners doing little toy projects.
Making Embedded Systems: Design Patterns for Great Software, 2011, by Elecia White. This is an excellent book on the software for small embedded systems that don't use operating systems (known as bare-metal, hard-loop, or superloop systems), introducing a broad range of topics essential to all types of embedded systems. And yes, the topic of design patterns is applicable to embedded systems in C. It's not just for non-embedded systems in object-oriented languages. The details of implementation are just different. 
Exploring Raspberry Pi: Interfacing to the Real World with Embedded Linux, 2016, by Derek Molloy. This goes into significantly more depth on the Raspberry Pi and embedded Linux. It's quite extensive, so is best approached by dividing it into beginner, intermediate, and advanced topics, based on your knowledge level at the moment. Spread out your reading accordingly. It has great information on hardware as well as software, including many details of the Linux environment. Two particularly fascinating areas are using other microcontrollers such as Arduino as slave real-time controllers, and creating Linux Kernel Modules (LKMs).
Make: Electronics: Learning Through Discovery, 2nd Edition, 2015, by Charles Platt. This is hands down the best book on introductory electronics I've ever seen. Platt focuses primarily on other components rather than microcontrollers, covering what all those other random parts on a board do. See Review: Make: Electronics and Make:More Electronics for more information on this and the next book, and Learning About Electronics And Microcontrollers for additional resources.
Make: More Electronics: Journey Deep Into the World of Logic Chips, Amplifiers, Sensors, and Randomicity, 2014, by Charles Platt. More components that appear in embedded systems.
Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems, 2006, David J. Agans. By now you've found many ways to get into trouble with code and hardware. This is a fantastic book for learning how to get out of trouble. It's a simple read that outlines a set of very practical rules that are universally applicable to many situations, then elaborates on them with real-life examples. 
Real-Time Concepts for Embedded Systems, 2003, by Qing Li and Caroline Yao. This is an introduction to the general concurrency control mechanisms in embedded operating systems (and larger-scale systems).
Reusable Firmware Development: A Practical Approach to APIs, HALs, and Drivers, 2017, by Jacob Beningo. This covers how to write well-structured low-level device driver code in a way that you can use on multiple projects. Embedded systems are notorious for having non-reusable low-level code, written in a way that's very specific to a specific hardware design, which can often ripple up to higher levels. That means you have to rewrite everything every time for every project. Good Hardware Abstraction Layers (HALs) and Application Programming Interfaces (APIs) provide a disciplined, coherent approach that allows you to reuse code across projects, saving you enormous amounts of time in development and testing. This also helps you become a better designer, because it encourages you to think in a modular way, starting to think in terms of broader architecture in a strategic manner, not just how to deal with the immediate problem at hand in a tactical manner.
Embedded Systems Architecture, 2018, by Daniele Lacamera. This is a very up-to-date book that uses the popular ARM Cortex-M microcontroller family as its reference platform. That makes it a great complement to Samek's video series, since the TI TIVA C that he uses is an ARM Cortex-M processor. This also goes into more detail on areas  such as the toolchain (including debugging with OpenOCD), bootloading, and memory management. It briefly uses the ST STM32F746 Discovery board as an example.
Embedded Systems Fundamentals with Arm Cortex-M based Microcontrollers: A Practical Approach, 2017, by Alexander G. Dean. As the name indicates, this is another detailed book on ARM Cortex-M, intended as a college-level textbook. Among other good practical details, it includes a nice chapter on analog interfacing. It uses the inexpensive NXP FRDM-KL25Z development board for hands-on examples.
TI Tiva ARM Programming For Embedded Systems: Programming ARM Cortex-M4 TM4C123G with C, 2016, by Muhammad Ali Mazidi, Shujen Chen, Sarmad Naimi, and Sepehr Naimi. This is a detailed book that uses the exact same Tiva C board as Samek's video series. 
Designing Embedded Hardware: Create New Computers and Devices, 2nd Edition, 2005, by John Catsoulis. This covers the hardware side of things, an excellent complement to White's book. It provides the microcontroller information to complement Platt's books.
Test Driven Development for Embedded C, 2011, by James Grenning. This is a spectacular book on designing and writing high quality code for embedded systems. See Review: Test Driven Development for Embedded C, James W. Grenning for full details. Just as White's book applies concepts from the OO world to embedded systems, Grenning applies Robert C. Martin's "Clean Code" concepts that are typically associated with OO to embedded systems. We'll all be better off for it.
Modern C++ Programming with Test-Driven Development: Code Better, Sleep Better, 2013, by Jeff Langr. This is an equally spectacular book on software development. It reinforces and goes into additional detail on the topics covered in Grenning's book, so the two complement each other well. Even if you don't know C++, it's generally easy enough to follow and the material still applies.
Taming Embedded C (part 1), 2016, by Joe Drzewiecki. This YouTube video is part of the Microchip MASTERs conference series. It covers some of the things that can be risky in embedded code and some methods for avoiding them. This gets into the characteristics that make embedded systems more challenging. I like to watch videos like this at 2X speed initially. Then I go back through sections at normal speed if I need to watch them more carefully.
Interrupt and Task Scheduling - No RTOS Required, 2016, by Chris Tucker. Another MASTERs video, this covers a critical set of topics for working in embedded systems.
Some Advanced Resources

Ready to dig in further and deeper?
MC/OS the Real-Time Kernel, 1992, by Jean Labrosse. Labrosse decided to write his own real-time operating system when he had trouble getting support for a commercial one he was using. The rest is history. You can hear some of that history in this podcast interview with him, "How Hard Could It Be?". This not only explains how things work under the hood, it gives you the source code.
MC/OS III, The Real-Time Kernel for the Texas Instruments Stellaris MCUs, 2010, by Jean Labrosse. This covers the 3rd generation of MC/OS, as well as details on the Stellaris microcontroller covered in Samek's video series. You can also download a free PDF version of this, as well as companion software. The MC/OS II and other books are also available there. The value in getting multiple versions is to see how the software evolved over time.
Software Engineering for Embedded Systems: Methods, Practical Techniques, and Applications, 2nd edition, 2019, edited by Robert Oshana and Mark Kraeling. This is a broad survey of topics by various authors (Labrosse wrote the chapter on real-time operating systems).
Some Hardware

The items listed below include some of the inexpensive boards and evaluation kits used in the resources above. There are a bazillion microcontroller boards out there that are useful for learning how to work on embedded systems. It's worth getting some from different vendors so you can learn their different microcontrollers, different capabilities, and different toolchains.

That also helps you appreciate the importance of abstracting low-level hardware differences in writing your code. Each vendor provides a range of support tools as part of the package.

Note that large vendor websites can be a pain, because they want you to create an account with profile, asking questions like your company name (call it "Independent"), what your application is, how many zillion parts you expect to order, when you expect to ship your product, etc. They're setup for industrial use, not hobbyist/individual use. They also may work through distributors like Mouser or Digi-Key for shipping and orders. Just roll with it!

The hardware:
Arduino Uno - R3, $22.95. This is the board used in the Arduino video listed above. There's also a wide array of "shields" available, external devices that connect directly to the board. Exploring these is one of the great educational values that Arduino offers. Remember that because Arduino takes care of many of the details for you, you can be up and learning about new devices faster. Then you can take that knowledge and apply it to other boards. You can also download the Arduino IDE there.
Raspberry Pi 3 - Model B+, $35. This is an updated version of the boards used in Simon Monk's books above. You will also need the 5V 2.5A Switching Power Supply with 20AWG MicroUSB Cable, $7.50, and the 8GB Card With full PIXEL desktop NOOBS -  v2.8. You may also want the Mini HDMI to HDMI Cable - 5 Feet, $5.95, and the Ethernet Hub and USB Hub w/ Micro USB OTG Connector, $14.95. These are sufficient to connect it to a monitor, keyboard, and mouse, and use it as a desktop Linux computer.
Texas Instruments MSP430F5529 USB LaunchPad Evaluation Kit, $12.99 (16-bit microcontroller). An evaluation kit is a complete ready-to-use microcontroller board for general experimentation. Christopher Svec uses this kit in his blog series above, where he also covers using the free downloadable development software. If you buy directly from the TI site, register as an "Independent Designer".
Texas Instruments Stellaris LaunchPad Evaluation Kit, was $12.99 (32-bit microcontroller). This is the kit Miro Samek started out with in lesson 0 of his video series above. However, as he points out at the start of lesson 10, TI no longer sells it, and has replaced it with the Tiva C LaunchPad, which is an acceptable replacement (see next item below). 
You might be able to find the Stellaris offered by third-party suppliers. But you have to be careful that you're actually going to get that, and not the Tiva C kit, even though they list it as Stellaris. I now have two Tiva C boards because of that, one that I order directly from TI, and one that was shipped when I specifically ordered a Stellaris from another vendor.
Fortunately, that doesn't matter for this course, but it highlights one of the problems you run into with embedded systems, that vendors change their product lines and substitute products (sometimes it's just rebranding existing products with new names, which appears to be what TI did here). That can be confusing and annoying at the least, and panic-inducing at the worst, if something you did in your project absolutely depends on a hardware feature of the original product that's not available on the replacement.
One of the design lessons you should learn is to future-proof your projects and try to isolate hardware-specific features so that you can adapt to the newer product when necessary.
Texas Instruments Tiva C TM4C123G LaunchPad Evaluation Kit, $12.99 (32-bit microcontroller). This is TI's replacement for the Stellaris LaunchPad, that you can use with Miro Samek's video series. Samek addresses the replacement issue at the beginning of lesson 10. The good news is that he says the Tiva C is equivalent to the Stellaris (apparently all TI did was rename the product), so it's usable for the course. You'll notice that some parts of the toolchain (the software you use to develop the software for the board, in this case the IAR EWARM size-limited evaluation version) still refer to it as TI Stellaris. 
The specific TI device on the board is the TM4C123GH6PM, so when you set the EWARM Project->Options->General Options->Device, you can select TexasInstruments->TM4C->TexasInstruments TM4C123GH6PM, not theLM4F120H5QR that's on the Stellaris board. However, Samek shows that you can continue to use the toolchain configured for Stellaris.
That's one of those details that can be maddening when vendors swap parts around on you. Getting it wrong can produce subtle problems, because some things may work fine (you selected a device variant very similar to the one you need), but others won't. Welcome to the world of embedded development! Small details matter. The alphabet soup and sea of numbers in the product names can also drive you batty and be a source of mistakes. PAY CLOSE ATTENTION!
A related detail: the file lm4f120h5qr.h that Samek supplies in his projects for working with the Stellaris board's processor also works with the Tiva C board's processor. However, there is also a TM4C123GH6PM.h file for the Tiva processor. Both files are in the directory C:\Program Files (x86)\IAR Systems\Embedded Workbench 8.2\arm\inc\TexasInstruments (or whichever version of EWARM you have).
You can copy them to your project directory, or have the compiler use that directory as an additional include directory by selecting Project->Options->C/C++ Compiler and clicking the ... button next to the "Additional include directories:" box.
STMicroelectronics STM32F746 Discovery Board, $54 (ARM Cortex-M7 microcontroller). This is used briefly in Daniele Lacamera's book above. It's relatively expensive compared to the other evaluation kits here, but includes a 4.3" LCD capacitive touch screen and other hardware elements, making it a much more capable platform, and still an outstanding value.
NXP Semiconductor FRDM-KL25Z Freedom Development Board, $15 (ARM Cortex-M0+ microcontroller). This is the board that Alexander Dean uses in his book above.
uC32: Arduino-programmable PIC32 Microcontroller Board, $34 (Microchip PIC32 32-bit processor). This isn't covered specifically by any of the resources above, but the PIC32 microcontroller is a popular family that offers a different hardware environment. This is programmable using the Arduino IDE, and can also be programmed using Microchip's MPLAB IDE.
Adafruit Parts Pal, $19.95. This is a small general parts kit for working with the various boards above. It includes LEDs, switches, resistors, capacitors, simple sensors, a small breadboard, and jumper wires for interconnecting things, plus a few other interesting items.
RoboGrok parts kit, $395. This is the parts kit for Angela Sodemann's course above. While you can gather the parts yourself for less, she saves you all the work of doing that, and buying her kit is a nice way of compensating her. 
Extech EX330 Autoranging Mini Multimeter, $58. There are also a bazillion multimeters out there. This is one reasonable mid-range model. The multimeter is a vital tool for checking things on boards.
One of the following logic analyzers. A logic analyzer is an incredibly valuable tool that allows you to see complex signals in action on a board. They used to cost thousands of dollars and need a cart to roll them around. These are miraculously miniaturized versions that fit in your pocket, at a price that makes them a practical, must-have personal tool. They plug into your USB port and are controlled via free downloadable software you run on your computer:
Saleae Logic 8 Logic Analyzer, 8 D/A Inputs, 100 MS/s, $199 with awesome "enthusiast/student" discount of $200, using a discount code that you can request and apply to your cart when checking out, thanks guys! This is also covered briefly in Svec's blog series. You can play with the software in simulation mode if you don't have an analyzer yet.
Digilent Analog Discovery 2, 100MS/s USB Oscilloscope, Logic Analyzer and Variable Power Supply, Pro Bundle, $299. As amazing as the Saleae is, this one adds oscilloscope, power supply, and signal generator functions, combining a number of pieces of equipment into one tiny package. They also have academic discounts for those who qualify (36% discount on the base unit).
For a full shopping list to equip a personal electronics lab, see the Shopping List heading at Limor Fried Is My New Hero. That page also has many links to resources on how to use the tools.

Glossaries

It can be a bit maddening as you learn the vocabulary, with lots of terms, jargon, and acronyms being thrown around as if you completely understood them. As you get through the resources, the accumulation of knowledge starts to clarify things. Sometimes you'll need to go back and reread something once you get a little more information.
Other Links

These sites have articles and links useful for beginners through advanced developers.
Final Thought

Our society is becoming more and more dependent on embedded systems and the various backend and support systems they interact with. It's our responsibility as developers to build security in to make sure that we're not creating a house of cards ready to collapse at any moment. Because people's lives can depend on it.

If you think I'm overstating that, see Bruce Schneier's new book. We are the ones on the front lines.