How we built Picture-in-Picture in Firefox Desktop with more control over video

Picture-in-Picture support for videos is a feature that we shipped to Firefox Desktop users in version 71 for Windows users, and 72 for macOS and Linux users. It allows the user to pull a <video> element out into an always-on-top window, so that they can switch tabs or applications, and keep the video within sight — ideal if, for example, you want to keep an eye on that sports game while also getting some work done.

As always, we designed and developed this feature with user agency in mind. Specifically, we wanted to make it extremely easy for our users to exercise greater control over how they watch video content in Firefox.

Firefox is shown playing a video, and a mouse cursor enters the frame. Upon clicking on the Picture-in-Picture toggle on the video, the video pops out into its own always-on-top player window.
Using Picture-in-Picture in Firefox is this easy!

In these next few sections, we’ll talk about how we designed the feature and then we’ll go deeper into details of the implementation.

The design process

Look behind and all around

To begin our design process, we looked back at the past. In 2018, we graduated Min-Vid, one of our Test Pilot experiments. We asked the question: “How might we maximize the learning from Min-Vid?“. Thanks to the amazing Firefox User Research team, we had enough prior research to understand the main pain points in the user experience. However, it was important to acknowledge that the competitive landscape had changed quite a bit since 2018. How were users and other browsers solving this problem already? What did users think about those solutions, and how could we improve upon them?

We had two essential guiding principles from the beginning:

  1. We wanted to turn this into a very user-centric feature, and make it available for any type of video content on the web. That meant that implementing the Picture-in-Picture spec wasn’t an option, as it requires developers to opt-in first.
  2. Given that it would be available on any video content, the feature needed to be discoverable and straight-forward for as many people as possible.

Keeping these principles in mind helped us to evaluate all the different solutions, and was critical for the next phase.

Three sketches showing a possible drag and drop interaction for picture-in-picture
Exploring different interactions for Picture-in-Picture

Try, and try again

Once we had an understanding of how others were solving the problem, it was our turn to try. We wanted to ensure discoverability without making the feature intrusive or annoying. Ultimately, we wanted to augment — and not disrupt — the experience of viewing video. And we definitely didn’t want to cause issues with any of the popular video players or platforms.

A screenshot of a YouTube page with a small blue rectangle on the right edge of the video, center aligned
A screenshot of one of our early prototypes

This led us to building an interactive, motion-based prototype using Framer X. Our prototype provided a very effective way to get early feedback from real users. In tests, we didn’t focus solely on usability and discoverability. We also took the time to re-learn the problems users are facing. And we learned a lot!

The participants in our first study appreciated the feature, and while it did solve a problem for them, it was too hard to discover on their own.

So, we rolled our sleeves up and tried again. We knew what we were going after, and we now had a better understanding of users’ basic expectations. We explored, brainstormed solutions, and discussed technical limitations until we had a version that offered discoverability without being intrusive. After that, we spent months polishing and refining the final experience!

Stay tuned

From the beginning, our users have been part of the conversation. Early and ongoing user feedback is a critical aspect of product design. It was particularly exciting to keep Picture-in-Picture in our Beta channel as we engaged with users like you to get your input.

We listened, and you helped us uncover new blind spots we might have missed while designing and developing. At every phase of this design process, you’ve been there. And you still are. Thank you!

Implementation detail

The Firefox Picture-in-Picture toggle exists in the same privileged shadow DOM space within the <code><video> element as the built-in HTML <video> controls. Because this part of the DOM is inaccessible to page JavaScript and CSS stylesheets, it is much more difficult for sites to detect, disable, or hijack the feature.

Into the shadow DOM

Early on, however, we faced a challenge when making the toggle visible on hover. Sites commonly structure their DOM such that mouse events never reach a <video> that the user is watching.

Often, websites place transparent nodes directly over top of <video> elements. These can be used to show a preview image of the underlying video before it begins, or to serve an interstitial advertisement. Sometimes transparent nodes are used for things that only become visible when the user hovers the player — for example, custom player controls. In configurations like this, transparent nodes prevent the underlying <video> from matching the :hover pseudo-class.

Other times, sites make it explicit that they don’t want the underlying <video> to receive mouse events. To do this, they set the pointer-events CSS property to none on the <video> or one of its ancestors.

To work around these problems, we rely on the fact that the web page is being sent events from the browser engine. At Firefox, we control the browser engine! Before sending out a mouse event, we can check to see what sort of DOM nodes are directly underneath the cursor (re-using much of the same code that powers the elementsFromPoint function).

If any of those DOM nodes are a visible <video>, we tell that <video> that it is being hovered, which shows the toggle. Likewise, we use a similar technique to determine if the user is clicking on the toggle.

We also use some simple heuristics based on the size, length, and type of video to determine if the toggle should be displayed at all. In this way, we avoid showing the toggle in cases where it would likely be more annoying than not.

A browser window within a browser

The Picture-in-Picture player window itself is a browser window with most of the surrounding window decoration collapsed. Flags tell the operating system to keep it on top. That browser window contains a special <video> element that runs in the same process as the originating tab. The element knows how to show the frames that were destined for the original <video>. As with much of the Firefox browser UI, the Picture-in-Picture player window is written in HTML and powered by JavaScript and CSS.

Other browser implementations

Firefox is not the first desktop browser to ship a Picture-in-Picture implementation. Safari 10 on macOS Sierra shipped with this feature in 2016, and Chrome followed in late 2018 with Chrome 71.

In fact, each browser maker’s implementation is slightly different. In the next few sections we’ll compare Safari and Chrome to Firefox.

Safari

Safari’s implementation involves a non-standard WebAPI on <video> elements. Sites that know the user is running Safari can call video.webkitSetPresentationMode("picture-in-picture"); to send a video into the native macOS Picture-in-Picture window.

Safari includes a context menu item for <video> elements to open them in the Picture-in-Picture window. Unfortunately, this requires an awkward double right-click to access video on sites like YouTube that override the default context menu. This awkwardness is shared with all browsers that implement the context menu option, including Firefox.

Safari’s video context menu on YouTube.

Safari users can also right-click on the audio indicator in the address bar or the tab strip to trigger Picture-in-Picture:

The Safari web browser playing a video, with the context menu for the audio toggle in the address bar displayed. “Enter Picture in Picture” is one of the menu items.
Here’s another way to trigger Picture-in-Picture in Safari.

On newer MacBooks, Safari users might also notice the button immediately to the right of the volume-slider. You can use this button to open the currently playing video in the Picture-in-Picture window:

A close-up photograph of the MacBook Pro touchbar when a video is playing. There is an icon next to the playhead scrubber that opens the video in an always-on-top player window.
Safari users with more recent MacBooks can use the touchbar to enter Picture-in-Picture too.

Safari also uses the built-in macOS Picture-in-Picture API, which delivers a very smooth integration with the rest of the operating system.

Comparison to Firefox

Despite this, we think Firefox’s approach has some advantages:

  • When multiple videos are playing at the same time, the Safari implementation is somewhat ambiguous as to which video will be selected when using the audio indicator. It seems to be the most recently focused video, but this isn’t immediately obvious. Firefox’s Picture-in-Picture toggle makes it extremely obvious which video is being placed in the Picture-in-Picture window.
  • Safari appears to have an arbitrary limitation on how large a user can make their Picture-in-Picture player window. Firefox’s player window does not have this limitation.
  • There can only be one Picture-in-Picture window system-wide on macOS. If Safari is showing a video in Picture-in-Picture, and then another application calls into the macOS Picture-in-Picture API, the Safari video will close. Firefox’s window is Firefox-specific. It will stay open even if another application calls the macOS Picture-in-Picture API.

Chrome’s implementation

The PiP WebAPI and WebExtension

Chrome’s implementation of Picture-in-Picture mainly centers around a WebAPI specification being driven by Google. This API is currently going through the W3C standardization process. Superficially, this WebAPI is similar to the Fullscreen WebAPI. In response to user input (like clicking on a button), site authors can request that a <video> be put into a Picture-in-Picture window.

Like Safari, Chrome also includes a context menu option for <video> elements to open in a Picture-in-Picture window.

The Chrome web browser playing a video, with the context menu for the video element hovering over top of it. “Picture in Picture” is one of the menu items.
Chrome’s video context menu on YouTube.

This proposed WebAPI is also used by a PiP WebExtension from Google. The extension adds a toolbar button. The button finds the largest video on the page, and uses the WebAPI to open that video in a Picture-in-Picture window.

The Chrome web browser playing a video. The mouse cursor clicks a button in the toolbar provided by a WebExtension which pops the video out into an always-on-top player window.
There’s also a WebExtension for Chrome that adds a toolbar button for opening Picture-in-Picture.

Google’s WebAPI lets sites indicate that a <video> should not be openable in a Picture-in-Picture player window. When Chrome sees this directive, it doesn’t show the context menu item for Picture-in-Picture on the <video>, and the WebExtension ignores it. The user is unable to bypass this restriction unless they modify the DOM to remove the directive.

Comparison to Firefox

Firefox’s implementation has a number of distinct advantages over Chrome’s approach:

  • The Chrome WebExtension which only targets the largest <video> on the page. In contrast, the Picture-in-Picture toggle in Firefox makes it easy to choose any <video> on a site to open in a Picture-in-Picture window.
  • Users have access to this capability on all sites right now. Web developers and site maintainers do not need to develop, test and deploy usage of the new WebAPI. This is particularly important for older sites that are not actively maintained.
  • Like Safari, Chrome seems to have an artificial limitation on how big the Picture-in-Picture player window can be made by the user. Firefox’s player window does not have this limitation.
  • Firefox users have access to this Picture-in-Picture capability on all sites. Websites are not able to directly disable it via a WebAPI. This creates a more consistent experience for <video> elements across the entire web, and ultimately more user control.

Recently, Mozilla indicated that we plan to defer implementation of the WebAPI that Google has proposed. We want to see if the built-in capability we just shipped will meet the needs of our users. In the meantime, we’ll monitor the evolution of the WebAPI spec and may revisit our implementation decision in the future.

Future plans

Now that we’ve shipped the first version of Picture-in-Picture in Firefox Desktop on all platforms, we’re paying close attention to user feedback and bug intake. Your inputs will help determine our next steps.

Beyond bug fixes, we’d like to share some of the things we’re considering for future feature work:

  • Repositioning the toggle when there are visible, clickable elements overlapping it.
  • Supporting video captions and subtitles in the player window.
  • Adding a playhead scrubber to the player window to control the current playing position of a <video>.
  • Adding a control for the volume level of the <video> to the player window.

How are you using Picture-in-Picture?

Are you using the new Picture-in-Picture feature in Firefox? Are you finding it useful? Please us know in the comments section below, or send us a Tweet with a screenshot! We’d love to hear what you’re using it for. You can also file bugs for the feature here.

About Mike Conley

Engineer working on Firefox for Desktop

More articles by Mike Conley…

About Emanuela Damiani

More articles by Emanuela Damiani…


29 comments

  1. Kilian

    Hi I just wanted to thank you all for your great work!…

    January 15th, 2020 at 09:47

  2. wes

    Any plans to add “playlist” functionality?

    January 16th, 2020 at 01:25

  3. Victor

    Really great useful function.
    Used Chrome just for this thing before.

    But could you please add also timeline bar and left right button controls of time.

    Thankyou

    January 16th, 2020 at 02:46

  4. dc12

    Good job guys!

    January 16th, 2020 at 08:42

  5. Chris

    Great Job on the Picture in Picture. I really like it. It does need to have the ability to control volume and a video scrub bar. Otherwise it is pretty awesome. I detest the spec Google is trying to push through. I do not think Developers should be able to push content to Picture in picture (Ads). i think if google has their way, it will destroy the feature.

    January 16th, 2020 at 08:50

  6. Eugene

    Native PiP on macOs, please! I work with apps on full-screen mode, and your pip window can’t overlap them!

    January 16th, 2020 at 09:36

  7. Matt

    Great to know about the double-right click trick for getting the standard context menu. Is there a similar tap/gesture which can be used on Firefox for Android to get the native context menu to display?

    January 16th, 2020 at 17:19

    1. Mike Conley

      > Is there a similar tap/gesture which can be used on Firefox for Android to get the native context menu to display?

      I’m not aware of one, sorry.

      January 21st, 2020 at 11:09

  8. mt

    Chrome experience is better:
    1. Do not destroy the website ui
    2. Consistent with full screen experience
    3. The developer requests the undisplayed video into the picture-in-picture

    January 16th, 2020 at 19:42

  9. Mohak

    This is indeed a very useful feature. On macOS, unlike the system’s PiP window, firefox’s PiP window does not show up when I switch to a different space. Could you guys please consider adding this?

    January 16th, 2020 at 22:13

    1. Mike Conley

      Thanks, this issue is being tracked in https://bugzilla.mozilla.org/show_bug.cgi?id=1610613.

      January 21st, 2020 at 11:21

  10. Øystein

    This feature should not be called PiP; its proper name is ADDBD – Attention Deficit Disorder By Design.

    But technically it might be great for all I know.

    January 16th, 2020 at 22:21

  11. al

    Great new feature! Using it here and there ;)

    I’d say the scrubber would be highest on my list, e.g. today I wanted to know how far into the video I was – so more a read-only thing at that moment.

    January 17th, 2020 at 02:20

  12. Arpit

    There are problems using it on sites where picture-in-picture toggle conincides with other controls of the video player(specifically you can see it on udemy). I wish you people can adjust the picture-in-picture toggle to a different location in such situations or find some other solution for these kind of cases.
    – Thanks

    January 17th, 2020 at 08:16

  13. Forc

    I really wish that I could set so I can make it so all or certain currently playing videos automatically goes to PiP when i switch tabs and maybe even optionally closes PiP when I go back to the original tab.

    It would also be cool if Firefox could enable PiP when you minimize it or it loses focus but I’m not sure if that’s possible.

    Sometimes just want certain videos to automatically follow me around when I browse other sites.

    Thanks for this great feature.

    January 18th, 2020 at 00:41

  14. Daniel

    “Repositioning the toggle when there are visible, clickable elements overlapping it.”

    Really can’t wait for this!! When studying on Udemy.com, the Picture-in-picture button is right behind “Skip to next video”. Changing position would be awesome!

    January 18th, 2020 at 10:01

  15. N

    I’m actually really happy with this feature. Great work

    January 18th, 2020 at 19:16

  16. Sebas

    Really nice feature it just really really needs seek and volume controls it’s frustrating to have to go back in a video or forward and you have to close the window then resize it all over again.

    January 19th, 2020 at 17:25

  17. Agustina Chaer

    One other thing that safari has and I can not seem to get it working on firefox is the picture in picture window staying visible in all spaces in mac. Does it work already? is it in the plans?

    January 21st, 2020 at 05:18

    1. Mike Conley

      Thanks – this issue is being tracked in https://bugzilla.mozilla.org/show_bug.cgi?id=1610613.

      January 21st, 2020 at 11:21

  18. Joel Stransky

    I’ve created an alternative PiP plugin for Chrome called “Kite Screen” that allows choice of video.

    https://chrome.google.com/webstore/detail/kite-screen/defdbpekkkcojmcogkfgehlfinjhcdno

    January 22nd, 2020 at 09:28

  19. shamim kulabako

    i think their is a problem using it on sites where picture-in-picture toggle coincides with other controls. we tried it on https://www.gorillawalkingsafaris.com/ and failled.

    But i guess we need to learn more and more and we can get it better

    Thanks though

    January 23rd, 2020 at 02:07

  20. Benni

    Hi,
    Thanks for writing on this interesting topic.

    I’ve been embracing firefox as my browser on macOS since quantum and I’m super happy with it. I’m still using Safari just for youtube/netflix (with PIPifier extension). I’m just not a fan of the custom Firefox PiP implementation, mainly due to lacking support for workspaces. I’ve seen a few apps trying to have custom PiP implementations, but even though there is some somewhat working with workspaces, they are still very quirky for the most times. For example while “sliding” to a workspace, the video is black. Or, most of them, will not work above fullscreen apps. This is where native macOS PiP will work pretty well.

    I see that there is some advantages to have a custom integration, but as someone working on a 13″ mac daily, I’m relying super heavily on workspaces and fullscreen apps and I really don’t believe custom PiP will ever be able to provide the same level of UX as the native one does. And this, I guess, also applies to Windows PiP.

    I don’t completely understand why so much effort is put into reinventing the weel. Some of the critizism may be valid (size limitations, positional limitations, multiple simultanous videos), but, for me, these shortcomings are by far outweight by the advantages.

    Would love to see some community discussions and maybe polls about how other users feel about what solutions are more desirable. I, for my part, hope firefox will some day support native PiP so I can ditch Safari completely.

    January 24th, 2020 at 02:21

  21. Priya Singh

    Really nice feature it just really needs seek and volume controls it’s frustrating to have to go back in a video or forward and you have to close the window then resize it all over again. this type article i am search long time. so yore very very thanks. Sarkari Naukri

    January 30th, 2020 at 19:37

  22. Chris

    For me, the key use case is to detach the video from the browser window – and that isn’t implemented.

    January 31st, 2020 at 18:00

  23. Bruce Williams

    I love the new picture-in-picture. I use it all the time, mainly when I’m doing something in another app, but still want to keep an eye on the video.

    February 2nd, 2020 at 14:37

  24. Bruce Williams

    Just started watching the superbowl stream on foxsports.com – there is no picture-in-picture option. Is it just some format that’s not yet supported, or can sites opt out of this feature?

    February 2nd, 2020 at 15:33

  25. Michael Bruus

    When websites – as some of mine does – uses a fullscreen video as background, the PiP button is really annoying. It would be really nice if websites had the option to hide it, thus indicating, that PiP is not intended for that specific video. If users insist, they’ll still have the option to right-click and force PiP anyway.

    I think that would be a good compromise.

    I find it somewhat invasive, when browsers force controls upon my website design.

    February 7th, 2020 at 18:22

    1. Mike Conley

      Hi Michael,

      Are you able to provide a link to your website? We don’t tend to show the toggle on videos that are silent, which is usually the case for background videos. Does your background video have an audio track?

      February 7th, 2020 at 18:32

Comments are closed for this article.