Ruby加载外部代码的4种方法(英)

2015-9-26 6:23:26 | 浏览:
评论数:0

There are many ways to load Ruby code, and that has lead to confusion over the years. In this article, I will give you the backstory behind several conventions seen in the wild and share some stories about how I use those conventions in my own code.

The topic of code loading breaks up naturally into two subtopics: loading code within your own project and loading code from third-party libraries. People tend to struggle more with loading code properly within their own projects than they do with loading code from third-party libraries, so that's what I'll focus on exclusively in this issue.

For now, I will focus on the basic mechanics of load(), auto_load(), require(), and require_relative(). I'll discuss how they work so you can then think about how they can be used within your own projects.

Kernel#load

Suppose we have a file called calendar.rb that contains the code shown here:

class Calendar
  def initialize(month, year)
    @month = month
    @year  = year
  end

  # A simple wrapper around the *nix cal command.
  def to_s
    IO.popen(["cal", @month.to_s, @year.to_s]) { |io| io.read }
  end
end

puts Calendar.new(8, 2011)

Given an absolute path to this file, the contents will be loaded and then executed immediately:

>> load "/Users/seacreature/devel/practicing-ruby-2/calendar.rb"
    August 2011
Su Mo Tu We Th Fr Sa
    1  2  3  4  5  6
 7  8  9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31

I can also just specify a path relative to my current working directory and get the same results. That means that if calendar.rb is in the same directory from which I invoked my irb session, I'm able to call load() in the manner shown here:

>> load "./calendar.rb"
    August 2011
Su Mo Tu We Th Fr Sa
    1  2  3  4  5  6
 7  8  9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31

An interesting thing about load() is that it does not do any checks to see whether it has already loaded a file and will happily reload and reexecute a file each time you tell it to. So, in practice, the implementation of load() is functionally similar to the code shown here:

def fake_load(file)
  eval File.read(file)
  true
end

The main benefit of indiscriminately reloading and reexecuting code is that you can make changes to your files and then load() them again within a single session without having to restart the program that's loading the code. So, for example, if we changed calendar.rb to output August 2012 instead of August 2011, we could just load it again without restarting irb. But we'd also be greeted with some warnings in the process:

>> load "./calendar.rb"
/Users/seacreature/devel/practicing-ruby-2/calendar.rb:2: 
warning: method redefined; discarding old initialize
/Users/seacreature/devel/practicing-ruby-2/calendar.rb:2: 
warning: previous definition of initialize was here
/Users/seacreature/devel/practicing-ruby-2/calendar.rb:8: 
warning: method redefined; discarding old to_s
/Users/seacreature/devel/practicing-ruby-2/calendar.rb:8:
warning: previous definition of to_s was here
August 2012
Su Mo Tu We Th Fr Sa
      1  2  3  4
5  6  7  8  9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31

If you remember that Ruby classes and modules are permanently open to modification, these warnings should make a lot of sense. The first time we called load(), it defined the initialize() and to_s() methods for the Calendar class. The second time we called load(), that class and its methods already existed, so it redefined them. This is not necessarily a sign of a bug, but Ruby warns you of the possibility that it might be.

Ultimately, these warnings are Ruby telling you that there is probably a better way for you to do what you're trying to do. One interesting way to get around the problem is to use Kernel#load()'s wrap functionality. Rather than telling you directly how it works, I'm going to show you by example and see if you can guess what's going on.

Suppose we kill our irb session and fire up a new one; we're now back to a blank slate. We then run the following code and see the familiar calendar output:

>> load "./calendar.rb", true
    August 2012
Su Mo Tu We Th Fr Sa
          1  2  3  4
 5  6  7  8  9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31

Then we decide that we want to look a little deeper into the future so that we know what to plan for in AD 2101. We reload the code using the same command as before:

>> load "./calendar.rb", true
    August 2101
Su Mo Tu We Th Fr Sa
    1  2  3  4  5  6
 7  8  9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31

This time, we don't see any warnings, so obviously something has changed. Here's a clue:

>> Calendar
NameError: uninitialized constant Object::Calendar
  from (irb):2
  from /.../.rvm/rubies/ruby-1.9.2-p180/bin/irb:16:in `<main>'

Surely the Calendar class must have been defined somewhere, because the program worked as expected. So what is going on here? Take a look at the following code; it should give you a clearer picture of what is happening:

def fake_load(file)
  Module.new.module_eval(File.read(file))
  true
end

In this implementation, our approximation of load() is evaluating the loaded code in the context of an anonymous module, which essentially wraps everything its own namespace. This step prevents any of the constants defined in the loaded code from being defined within the global namespace, including any class or module definitions.

The existence of this option is a hint that although load() is suitable for code loading, it is geared more to implementing customized runners for Ruby code than to simply loading the classes and modules in your projects. So if you've been using load() on a daily basis, you might be using the wrong tool for the job at least some of the time. It should be clear by the end of this article why that is the case.

Now that we have looked at the most simple code loading behavior Ruby has to offer, we will jump straight into the deep end and explore one of its most complex options: loading code on demand via Kernel#autoload.

Kernel#autoload

Regardless of whether you've used it explicitly in your own projects, the concept of automatically loading code on demand should be familiar to anyone familiar with Rails. In Rails, none of the classes or modules you define are loaded until the first time they are referenced in your running program. There are two main benefits to this design: faster startup time and delayed loading of optional dependencies.

Rails uses its own customized code to accomplish this result, but the basic idea is similar to what can be done with Ruby's autoload() method. To illustrate how autoload() works, let's revisit our Calendar class that we began building while discussing load(). This time, we have a file called calendar.rb that contains only the definition of the Calendar class, not the code that actually calls methods on it:

class Calendar
  def initialize(month, year)
    @month = month
    @year  = year
  end
  # A simple wrapper around the *nix cal command.
  def to_s
    IO.popen(["cal", @month.to_s, @year.to_s]) { |io| io.read }
  end
end

The following irb session demonstrates the behavior of autoload().

>> autoload(:Calendar, "./calendar.rb") #1
=> nil
>> defined?(Calendar)                   #2
=> nil
>> puts Calendar.new(8,2011)            #3
    August 2011
Su Mo Tu We Th Fr Sa
    1  2  3  4  5  6
 7  8  9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31
=> nil
>> defined?(Calendar)                   #4
=> "constant"

In our first step, we set up the autoload() hook, instructing Ruby to load the file calendar.rb at the time that the first constant lookup happens for the Calendar constant. In the second step, we check to ensure that autoload() does not actually load the file for you automatically by verifying that Calendar has not yet been defined. Then, in our third step, we build and output our Calendar. Last, we see that the constant is now defined.

This exposes us to some cool Ruby voodoo while also raising a lot of questions. It may help to approximate how autoload() might be implemented in order to wrap your head around the idea. Although the following code is evil and should never be used for anything but educational purposes, it simulates the load on demand behavior nicely.

$load_hooks = Hash.new
module Kernel
  def fake_autoload(constant_name, file_name)
    $load_hooks[constant_name] = file_name
  end
end
def Object.const_missing(constant)
  load $load_hooks[constant]
  const_get(constant)
end
fake_autoload :Calendar, "./calendar.rb"
p defined?(Calendar)
puts Calendar.new(8,2011)
p defined?(Calendar)

After reading the previous example code and playing with it a bit, remember the dependency on const_missing() and forget pretty much everything else about the implementation. The real autoload() handles a lot more cases than this trivial example gives it credit for.

With the const_missing() dependency in mind, try to guess what will happen when the following code is run:

class Calendar; end
autoload :Calendar, "./calendar.rb"
p defined?(Calendar)
puts Calendar.new(8,2011)
p defined?(Calendar)

If you guessed that it didn't output a nicely formatted calendar, you guessed correctly. Below you can see that when I run this script, all the code in calendar.rb never gets loaded, so the default Object#initialize and Object#to_s are being called instead:

"constant"
<Calendar:0x0000010086d6b0>
"constant"

Because autoload() does not check to see whether a constant is already defined when it registers its hook, you do not get an indication that the calendar.rb file was never loaded until you actually try to use functionality defined in that file. Thus autoload() is safe to use only when there is a single, uniform place where a constant is meant to be defined; it cannot be used to incrementally build up class or module definitions from several different source files.

This sort of rigidity is frustrating, because unlike load(), which does not care how or where you define your code, autoload() is much more opinionated. What you've seen here is a single example of the constraints it puts on you, but it is easy to imagine other scenarios in which autoload() can feel like a brittle way to load code. I'll leave it up to you to try to figure out some of those issues, but feel free to ask me for some hints if you get stumped.

In the context of Rails—particularly when working in development mode, in which the whole environment gets reloaded on every request—some form of automatic loading makes sense. However, outside of that environment, the drawbacks of autoload() tend to outweigh the benefits, so most Ruby projects tend to avoid it entirely by making heavy use of require().

Kernel#require()

If you've written any code at all outside of Rails, odds are you've used require() before. It is actually quite similar to load() but has a few additional features that come in handy. To illustrate how require() works, let's revisit our original calendar.rb file, the one that had a bit of code to be executed in the end of it.

class Calendar
  def initialize(month, year)
    @month = month
    @year  = year
  end
  # A simple wrapper around the *nix cal command.
  def to_s
    IO.popen(["cal", @month.to_s, @year.to_s]) { |io| io.read }
  end
end
puts Calendar.new(8, 2011)

If we attempt to load this code twice via require(), we immediately see an important way in which it differs from load().

>> require "./calendar.rb" #1
    August 2011
Su Mo Tu We Th Fr Sa
    1  2  3  4  5  6
 7  8  9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31
=> true
>> require "./calendar.rb" #2
=> false

When I ran require() the first time, the familiar calendar output greeted me, and then the function returned a true value. The second time I ran it, nothing happened and the function returned false. This is a feature, and not a bug. The following code is a crude approximation of what is going on under the hood in require():

$LOADED_BY_FAKE_REQUIRE = Array.new
def fake_require(file)
  full_path = File.expand_path(file)
  return false if $LOADED_BY_FAKE_REQUIRE.include?(full_path)
  load full_path
  $LOADED_BY_FAKE_REQUIRE << full_path
  return true
end

This behavior ensures that each file loaded by require() is loaded exactly once, even if the require() calls appear in many places. Therefore, updates to those files will take effect after they have been loaded once. Although this behavior makes require() less suitable than load() for quick exploratory code loading, it does prevent programs from needlessly reloading the same code again and again, similar to how autoload() works once a constant has been loaded.

Another interesting property of require() is that you can omit the file extension when loading your code. Thus require("./calendar") will work just as well as require("./calendar.rb"). Though this may seem like a small feature, the reason it exists is that Ruby can load more than just Ruby files. When you omit an extension on a file loaded with require(), it will attempt to load the file with the ".rb" extension first, but will then cycle through the file extensions used by C extensions as well, such as ".so", ".o", and ".dll". Despite being an obscure property, it's one that we often take for granted when we load certain standard libraries or third-party gems. This behavior is another detail that separates require() from load(), as the latter can work only with explicit file extensions.

The main benefit of using require() is that it provides the explicit, predictable loading behavior of load() with the caching functionality of autoload(). It also feels natural for those who use RubyGems, as the standard way of loading libraries distributed as gems is via the patched version of Kernel#require() that RubyGems provides.

Using require() will take you far, but it suffers from a pretty irritating problem—shared by load() and autoload()—with the way it looks up files. The require_relative() is meant to solve that problem, so we'll take a look at it now.

Kernel#require_relative()

Each time I referenced files using a relative path in the previous examples, I wrote the path to explicitly reference the current working directory. If you're used to using Ruby 1.8, this may come as a surprise to you. If you've been using Ruby 1.9.2, it may or may not appear to be the natural thing to do. However, now is the time when I confess that it's almost always the wrong way to go about things.

Ruby 1.9.2 removes the current working directory from your path by default for security reasons. So, in our previous example, if we attempted to write require("calendar") instead of require("./calendar"), it would fail on Ruby 1.9.2 even if we invoked irb in the same folder as the calendar.rb file. Explicitly referencing the current working directory works on both Ruby 1.8.7 and Ruby 1.9.2, which is why this convention was born. Unfortunately, it is an antipattern, because it forces us to assume that our code will be run from a particular place on the file system.

Imagine a more typically directory structure, such as this:

lib/
  calendar.rb
  calendar/
    month.rb
    year.rb
bin/
  calendar.rb

We could have a bin/ruby_calendar.rb file that looks like this code:

require "lib/calendar"
case ARGV.size
when 2
  puts Calendar::Month.new(ARGV[0], ARGV[1])
when 1
  puts Calendar::Year.new(ARGV[0])
else
  raise "Invalid arguments"
end

Similarly, our lib/calendar.rb file might include require() calls such as these:

require "lib/calendar/year"
require "lib/calendar/month"

Now if we run bin/ruby_calendar.rb from the project root, things will work as expected.

$ ruby bin/ruby_calendar.rb 2011
# ...

But if we ran this file from any other directory, it'd fail to work as expected because the relative paths would be evaluated relative to wherever you executed the files from, not relative to where the files live on the file system. That is, if you execute ruby_calendar.rb in the bin/ folder, it would look for a file called bin/lib/calendar.rb.

One way to solve this problem is to use the same mechanism that the Ruby standard library and RubyGems uses: modify the loadpath.

In bin/ruby_calendar.rb, we rewrite our code to match this:

$LOAD_PATH.unshift("#{File.dirname(__FILE__)}/../lib")
require "calendar"
case ARGV.size
when 2
  puts Calendar::Month.new(ARGV[0], ARGV[1])
when 1
  puts Calendar::Year.new(ARGV[0])
else
  raise "Invalid arguments"
end

Because we've added the lib/ folder to the lookup path for all require() calls in our application, we can modify lib/calendar.rb to match the following:

require "calendar/year"
require "calendar/month"

This approach makes it possible to run the ruby_calendar.rb program from any location within the file system, as long as we tell ruby where to find it. That means you can run it directly from within the bin/ folder, or even with an absolute path.

# NOTE: this is common in cron jobs.
$ ruby /Users/seacreature/devel/ruby_calendar/bin/ruby_calendar.rb

This approach works, and was quite common in Ruby for some time. Then, people began to get itchy about it, because it is definitely overkill. It effectively adds an entire folder to the $LOAD_PATH, giving Ruby one more place it has to look on every require and possibly leading to unexpected naming conflicts between libraries.

The solution to that problem is to not mess with the $LOAD_PATH in your code. Therefore, you expect either that the $LOAD_PATH variable will be properly set by the -I flag when you invoke ruby or irb, or that you have to write code that dynamically determines the proper relative paths to require based on your current working directory. The latter approach requires less effort from the end user but makes your code ugly. Below you'll see what people resorted to on Ruby 1.8 before a better solution came along:

# bin/ruby_calendar.rb
require "#{File.dirname(__FILE__)}/../lib/calendar"
case ARGV.size
when 2
  puts Calendar::Month.new(ARGV[0], ARGV[1])
when 1
  puts Calendar::Year.new(ARGV[0])
else
  raise "Invalid arguments"
end
# lib/calendar.rb
require "#{File.dirname(__FILE__)}/calendar/year"
require "#{File.dirname(__FILE__)}/calendar/month"

Using this approach, you do not add anything to the $LOAD_PATH but instead dynamically build up relative paths by referencing the __FILE__ variable and getting a path to the directory it's in. This code will evaluate to different values depending on where you run it from, but in the end, the right path will be produced and things will just work.

Predictably, people took efforts to hide this sort of ugliness behind helper functions, and one such function was eventually adopted into Ruby 1.9. That helper is predictably called require_relative(). Using require_relative(), we can simplify our calls significantly while preserving the "don't touch the $LOAD_PATH variable" ethos:

# bin/ruby_calendar.rb
require_relative "../lib/calendar"
case ARGV.size
when 2
  puts Calendar::Month.new(ARGV[0], ARGV[1])
when 1
  puts Calendar::Year.new(ARGV[0])
else
  raise "Invalid arguments"
end
# lib/calendar.rb
require_relative "calendar/year"
require_relative "calendar/month"

This code looks and feels like it would work in the way that we'd like to think require() would work. The files we reference are relative to the file in which the actual calls are made, rather than the folder in which the script was executed in. For this reason, it is a much better approach than pretty much anything I've shown so far.

Of course, it is not a perfect solution. In some cases, it does not work as expected, such as in Rackup files. Additionally, because it's a Ruby 1.9 feature, it's not built into Ruby 1.8.7. The former issue cannot be worked around, but the latter can be. I'll go into a bit more detail about both of these issues in the recommendations section, which is coming up right now.

Conventions and Recommendations

If you remember one thing from this article, it should be that whenever it's possible to use require_relative() and there isn't an obviously better solution, it's probably the right tool to reach for. It has the fewest dark corners and pretty much just works.

That said, take my advice with a grain of salt. I no longer actively maintain any Ruby 1.8 applications, nor do I have to deal with code that must run on both Ruby 1.8 and 1.9. If I were in those shoes again, I'd weigh out four different possible ways of approaching things:

1) Explicitly use require() with the File.dirname(__FILE__) hack

2) Write my own require_relative() implementation leaning on the previous hack that gets defined only if require_relative() isn't already implemented

3) Add a dependency for Ruby 1.8 only on the require_relative gem

4) Assume that $LOAD_PATH is set for me via the -I flag on execution, or some other means, and then write ordinary require calls relative to the lib/ folder in my project.

I can't give an especially good picture of when I'd pick one of those options over the other, because it's been about a year since I've last had to think about it. But any of those four options seem like at least reasonable ideas. I would not employ the common but painfully ugly require("./file_in_the_working_dir.rb") hack in any code that I expected to use for anything more than a spike or demonstration.

Whether using require_relative() explicitly, or one of the workarounds listed above, I like to use some form of relative require whenever I can. Occasionally, I do use load(), particularly in spikes where I want to reload files into an irb session without restarting irb. But I don't think that load() ends up in production code of mine unless there is a very good reason to use it. A possible good reason would be if I were building some sort of script runner, such as what you could find in Rails when it reloads your development environment or in autotest. In the autotest case in particular in which your test files are reloaded each time you make an edit to any of your files in your project, it seems that using load() with its obscure second parameter is a good idea. But these are not tools I'd expect to be building on a daily basis, so load() remains somewhat of an obscure tool for me.

I never use autoload(). I've just not run into the issues that some folks in Rails experience regarding slow startup times of applications in any way that has mattered to me. I feel like the various gotchas that come along with using autoload() and the strict conventions it enforces are not good things to impose on general-purpose uses of Ruby. I don't know whether I think that it makes sense in to context of Rails, but that's a very different question than whether it should be used in ordinary Ruby applications and libraries. It makes at least some sense in Rails, but in most Ruby applications, it does not. The only time I might think about looking into autoload() is if I had some sort of optional dependency that I wanted to be loaded only on demand. I have never actually run into that issue, and I've found that the following hack provides a way to do optional dependencies that seems to work just fine:

begin
  require "some_external_dependency"
  require "my_lib/some_feature_that_depends_on_dependency"
rescue LoadError
  warn "Could not load some_external_dependency."+
       " Some features are disabled"
end

But really, optional dependencies are things I very rarely need to think about. There are valid use cases for them, but unless something is very difficult to install or your project is specifically meant to wrap various mutually exclusive dependencies, I typically will just load up all my dependencies regardless of whether the user ends up using them. This policy has not caused me problems, but your mileage will vary depending on the type of work you are doing.

On a somewhat tangential note, I try to avoid things like dynamic require calls in which I walk over a file list generated from something like Dir.glob() or the like. I also avoid using Bundler.require(), even when I use bundler. The reason I avoid these things is because I like to be able control the exact order in which my files and my dependencies are being loaded. It's possible to not have to worry about this sort of thing, but doing so requires a highly disciplined way of organizing your code so that files can be loaded independently.

Questions / Feedback

I hope this background story about the various ways to load code along with the few bits of advice I've offered in the end here have been useful to you. I am happy to answer whatever questions you have; just leave a comment below.

来源:https://practicingruby.com/articles/ways-to-load-code

    相关文章:

发表评论:

◎欢迎参与讨论,请在这里发表您的看法、交流您的观点。